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WHAT WORKS 


The Promise of Behavioral Design 


As late as 1970, only 5 percent of musicians performing in the 
top five orchestras in the United States were women. Today, 
women compose more than 35 percent of the most acclaimed or- 
chestras, and they play great music. This did not happen by chance. 
Rather, it required the introduction of blind auditions. The Boston 
Symphony Orchestra was the first to ask musicians to audition 
behind a screen, and in the 1970s and 1980s most other major 
orchestras followed suit. When they did so, usually in preliminary 
rounds, it raised the likelihood that a female musician would ad- 
vance by 50 percent and substantially increased the proportion of 
women hired.! 

In theory, an orchestra director cares about the sounds coming 
out of the bassoon, the flute, and the trumpet, not the ethnicity 
or sex of the person playing the instrument. In practice, the Vi- 
enna Philharmonic, for example, admitted its first female player 
in 1997. Not so long ago. Orchestra directors and selection com- 
mittees were quite comfortable with all-male, all-white orches- 


tras and likely not aware of their biases. To change this, no great 
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technological feat was required, just awareness, a curtain, and a 
decision. Or, more precisely, a design decision. A simple curtain 
doubled the talent pool, creating amazing music and transforming 
what orchestras look like. But why did it take so long? 

Consider the following image and compare squares A and B. 
What do you see? 


Checkershadow illusion, part 1. 


Most people see square B as being lighter than square A. It turns 
out that this is an illusion. Your mind made sense of the pattern it 
saw, a checkerboard. You put squares into categories, dark and 
light, and put them in order: light squares next to dark squares. 
You may also have taken the shadow into account and made sure 
it did not trick you into not seeing a pattern that you knew had to 
be there. 

Consider the same checkerboard now, with square B isolated. 


Note that squares A and B in fact have the same color. They are 
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both dark. By blocking some of the checkerboard, we allowed your 
mind to see square B for what it is—another dark square. It no 
longer had to be in a certain category and obey certain rules. It was 
liberated from the patterns we expect, just as curtains liberated 
orchestra selection committees. Professional musicians typically 
are quite shocked when they learn how much they are influenced 
by visual cues. A recent series of experiments showed that com- 
petition judges consciously value sound as central to their deci- 
sion. Only the experimental evidence shows them that, in fact, 


they are instead relying heavily on visual cues.” 


Checkershadow illusion, part 2. 


Consider another, quite different example. A study examining 
the parole rulings of Israeli judges found that they ruled far more 
leniently right after meal breaks. Differing degrees of leniency 
were the unintended consequence of hunger, fatigue, the deple- 


tion of cognitive resources—and design. Just prior to taking a 
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break, the judges reverted to the easy solution: the status quo. After 
a break, they were more deliberative. The timing and number of 
breaks the judges took—the design—had unintentional conse- 
quences. Bad designs, whether consciously or unconsciously chosen, 
lead to bad outcomes. Bias is built into our practices and proce- 
dures, not just into our minds. Here is our opportunity.’ 

This book’s goal is to offer good designs to you; designs that 
make it easier for our biased minds to get things right. Based on 
research evidence, we can change the environments in which we 
live, learn, and work. My principal focus here is the stubborn, 
costly problem of gender inequality, but the recommendations 
I make stem from a wealth of research about decisions and be- 
havior that go well beyond gender. The book takes as a given that 
people make mistakes; they make them often and (sometimes) 
unknowingly. As a consequence, these mistakes reduce every- 
one’s well-being. The solutions I recommend come from the 
field of behavioral economics—putting up screens, timing breaks 
well, and dozens of more and less complicated interventions— 
all building on insights into how our minds work. My invitation 
to you is to become a behavioral designer—because it works, 
because it often is rather easy and inexpensive, and because it 
will start to level the playing field and give everyone greater op- 
portunity to thrive. 

Much like interior designers or landscape architects, behavioral 
designers create environments to help us better achieve our goals. 
They do not define goals, but they help us get there. Referred to 
as “choice architecture” in Richard Thaler and Cass Sunstein’s 
path-breaking book Nudge, behavioral design goes beyond law, 
regulation, or incentives, although it acknowledges that these are 


and will remain important. But they do not always work. Based 
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on 41 million observations for the population of Denmark, for 
example, research shows that tax subsidies have only a tiny im- 
pact on savings. Such incentives require people to take action 
and respond—which 85 percent of Danes fail to do. In contrast, 
behavioral designs that do not rely on people reacting to incen- 
tives but instead employ automatic mechanisms—such as auto- 
matic employer contributions to retirement accounts—do much 
better. They substantially increased the amount of money retirees 
have available. We do not always do what is best for ourselves, for 
our organizations, or for the world—and sometimes, a little nudge 
can help.* 

A simple curtain transformed what orchestras look like and 
doubled the talent pool. Benefiting from 100 percent talent is good 
business for orchestras and just about every other organization. 
Careful timing of breaks allows judges to make decisions more ac- 
curately and fairly. To the business case, then, we must add the 
moral case: behavioral design is the right thing to do. 

There is no design-free world. Organizations have to decide 
how to search for and select future employees. How they adver- 
tise open positions, where they post the job openings, how they 
evaluate applicants, how they create a short list, how they inter- 
view candidates, and how they make their final selections are all 
part of choice architecture. Why not design a bit more thought- 
fully, increasing the chances that the best people are hired? 

This book will show you how. Our research suggests, for ex- 
ample, that asking hiring managers to explicitly compare a given 
candidate with real alternatives makes evaluators focus on individual 
performance instead of stereotypes. Comparing two or more job 
candidates helps evaluators calibrate their judgments—without 


having to rely on an internal stereotype as a measuring rod. As 
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academic dean of the Kennedy School of Government, I introduced 
this and other insights shared in this book to the faculty hiring 
and promotion procedures at Harvard University. 

It bears repeating that design is everywhere. We constantly 
make choices about how to present information, structure inter- 
views, or create teams, and we live day in, day out with the con- 
sequences of those choices. Whether or not employees are asked 
to “opt in” or “opt out” of a pension plan might well determine 
whether or not they have enough money to enjoy retirement. How 
your company hires and promotes might well determine bottom- 
line performance. By changing the design, we change the out- 
come: good design can lead to positive outcomes—nudge by 
nudge. We begin by uncovering the root causes for certain be- 
haviors and designing interventions accordingly. These root causes 
include one difficult truth: no one is immune from biases. 

A few years ago I entered a day-care center at my workplace, 
Harvard University. I had our young son in my arms. Like mil- 
lions of parents who have taken their child to a caregiver for the 
first time, I was extremely anxious. One of the first teachers I saw 
was—a man. I wanted to turn around and run. How could I en- 
trust this man with the most precious thing in my life? He did not 
conform to my expectation of what a loving, caring, and nur- 
turing preschool teacher looked like. My reaction was not based 
on a conscious thought process, but rather on something deep in 
my gut. Was I being sexist? I fear the answer is yes. 

Thankfully, I overcame my biased snap judgment, the teacher 
proved great, and he became a trusted caregiver. But to this day 
my gut reaction bothers me. Only about 10 to 20 percent of the 
elementary school teachers in the United States and many other 


countries are male. These men face an uphill battle. Just as in or- 
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chestras, there is likely an untapped talent pool of elementary 
school teachers. What is more, society’s failure to draw on that 
pool of talent matters. A 2015 study by the Organisation for 
Economic Co-operation and Development (OECD) finds that 
at age fifteen, boys are 50 percent more likely than girls to lack 
basic proficiency in reading, mathematics, and science. The 
presence of male role models can impact what boys believe 
possible and important for themselves: seeing is believing. 

Stereotypes serve as heuristics—rules of thumb—that allow us 
to process information more easily, but they are often inaccurate. 
What is worse, stereotypes describing how we believe the world 
to be often turn into prescriptions for what the world should be. 
Much psychological research shows that we cannot help but put 
people (and other observations) into categories. It rarely is a con- 
scious thought process that informs our thinking about demo- 
graphic groups. Rather, when we learn the sex of a person, gender 
biases are automatically activated, leading to unintentional and im- 
plicit discrimination.’ 

Through behavioral design we can move the needle toward 
creating equal opportunities for female musicians, for male teachers, 
and for everyone else. Good design often harvests low-hanging 
fruit, left on the tree not so much because of bad intentions but 
rather because of the mind bugs that affect our judgment. Behav- 
ioral design offers an additional instrument for our collective 
toolbox to promote change; it complements other approaches fo- 
cusing, for example, on equal rights, education, health, agency, or 
on policies making work and family compatible. 

Much has been written about the “business case” for gender 
equality, and research continues to accumulate. One clear in- 


sight is that the answer to what degree closing gender gaps yields 


8 WHAT WORKS 


economic returns is difficult to determine if outcomes are based 
on flawed decision processes. Take the example of orchestras. I 
presume that orchestras benefited from the introduction of blind 
auditions because curtains allowed evaluators to choose the best 
performers and build the best team—which also increased the 
fraction of women. 

It is a trivial point but one largely overlooked in the literature. 
Whether or not the share of women and men in groups, say, cor- 
porate boards, is related to company performance does not depend 
only on the percentage of each gender on the board but also on how 
the board members, women and men, are chosen, how the boards 
are organized, and what the rules of engagement and decision 
making are. Gender equality is not just a numbers game. Numbers 
matter, but how those numbers came to be and how they work 
with each other is quite possibly even more important. 

Still, we have learned a lot about the business case for gender 
equality. A recent study measuring the impact of an increase in 
the talent pool on the US economy between 1960 and 2008 found 
that aggregate output per worker had grown by 15 to 20 percent 
due to an improved allocation of talent. For example, while in 
1960 the effective talent pool for doctors and lawyers consisted of 
white men—94 percent of all doctors and lawyers in the United 
States were white men—this had changed dramatically by 2008, 
when the fraction of white male doctors and lawyers had de- 
creased to 62 percent. Casting the net more widely, including 
women and African, Asian, Hispanic, and Native Americans, had 
paid off.° 

Leveling the playing field to include more women in the labor 
force is of vital economic importance for various countries. Con- 


sider Japan. The OECD estimates that if it does nothing to in- 
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crease the labor force participation rates of its women, and these 
remain at their 2011 levels of 63 percent for women and 84 percent 
for men, the country’s labor force will shrink by more than 
10 percent during the next twenty years. In contrast, if Japan 
achieved gender parity in its labor force, its gross domestic product 
(GDP) would increase by almost 20 percent over the next twenty 
years. High returns as a result of women’s economic inclusion is 
not just a Japanese phenomenon but generally shared by countries 
with low fertility rates, including Germany, Italy, Singapore, South 
Korea, and Spain, among others.’ 

A simulation assuming that women are completely excluded 
from the labor force found that this would lead to income per capita 
losses of almost 40 percent. Using labor market data for 126 coun- 
tries from the International Labor Organization (ILO) to calcu- 
late the actual gender gaps in workforce participation (as well as 
in self-employment and pay if available) in various regions of the 
world, the total income losses are largest (27 percent) in the Middle 
East and North Africa. In addition, for an increasing number 
of countries, the talent argument has gained importance as the 
gender gap in education has reversed and more women than men 
graduate from college. In the United States, for example, more 
than half of bachelor’s degrees have been held by women since the 
mid-1980s, and by the early twenty-first century, almost 60 percent 
of bachelor’s degree holders were female. 

While economists still debate the exact magnitude of the im- 
pact of increasing women’s labor-force participation on GDP, we 
can safely agree with Christine Lagarde, managing director of the 
International Monetary Fund (IMF), that “excluding women 
simply makes no economic sense—and including women can be 


a tremendous boon to the 21st century global economy.”® 
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On a micro-level, women have been found to put money to 
more productive use than men in several cases. In Ivory Coast, for 
example, there are “male” and “female” crops. Men grow coffee, 
cocoa, and pineapple; women grow plantains, bananas, coconuts, 
and vegetables. In years where the men’s crops have high yields, 
research shows, households spend more money on alcohol and 
tobacco. When the women have good harvests, in contrast, more 
money is spent on food. In the United States, interesting micro- 
evidence on the relevance of women’s inclusion stems from labo- 
ratory experiments measuring a group’s “collective intelligence” 
across a variety of tasks. Gender-diverse teams scored more highly 
on collective intelligence than all-male or all-female teams. Im- 
portantly, a group’s collective intelligence was only moderately 
related to members’ individual intelligence, suggesting that a 
gender diverse team can indeed be more than the sum ofits parts.’ 

While the macro- and the micro-evidence hold the promise 
of a business case, gender equality is not a magic bullet automati- 
cally leading to economic progress. This is why, at the end of the 
day, the case of gender equality must rest on a moral argument. It 
just is the right thing to do. Full stop." 

We cannot afford to get it wrong. In the most extreme case, 
getting it wrong is a matter of life and death. The United Nations 
estimates up to 200 million women and girls are missing world- 
wide as a result of sex-selective abortion, infanticide, neglect during 
the first five years, gendered violence, and discrimination later in 
life. This selective killing of members of a particular sex, referred 
to as “gendercide,” might well be the greatest human rights tragedy 
in history. If the same number of females were “missing” in the 
United States, America would be a men-only country. Nick Kristof 
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and Sheryl WuDunn remind us that this number exceeds all the 
men killed on the battlefield in all twentieth-century wars. 

While horrendous by itself, gendercide has further conse- 
quences. In January 2010, the Chinese Academy of Social Sci- 
ences calculated that in 2020, one in five Chinese men would not 
be able to find a bride. The Academy expects a surplus of about 
30 to 40 million young men without marriage prospects in 2020— 
which corresponds to nearly the whole young male population 
in the United States. A low female-to-male ratio has been shown 
to lead to an expansion of the marriage market by decreasing the 
age of brides, hampering their educational attainment and 
economic opportunities. It has also been linked to an increase 
in trafficking of girls, domestic violence, honor killings, and other 
crime." 

A problem that many fear too big to even start addressing in- 
spired an amazing experiment by Rob Jensen, a former colleague 
and now professor at the Wharton School of the University of 
Pennsylvania. He examined what impact seeing economic oppor- 
tunities for women in rural India had on how parents treated 
their daughters. Jensen exploited the fact that the business process 
outsourcing industry grew rapidly in India during the 1990s and 
created a significant number of new jobs, in particular for women. 
With the help of a recruitment firm, he provided three years of 
recruitment services to women in randomly selected rural villages. 
He then compared whether these women were more likely to 
work than their counterparts in control villages. Jensen also inves- 
tigated whether this translated into a change in how parents 
treated their daughters. Indeed, the recruitment services signifi- 
cantly increased employment among women (without affecting 
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men). In addition, in the villages chosen to receive recruitment ser- 
vices, girls aged five to fifteen experienced a substantial improve- 
ment in health and were significantly more likely to be in school. 

Seeing women work in call centers allowed parents to imagine 
a different future for their own daughters. While the number of 
women newly working in call centers was relatively small (an in- 
crease of 2.4 percentage points), even this small possibility chal- 
lenged parents’ beliefs and their stereotypes about what women 
could accomplish.” 

Behavioral design can affect faculty hiring in Cambridge, Mas- 
sachusetts, and create counterstereotypical role models in rural 
villages outside of New Delhi, India. These are just two of the 
places where these insights are helping people get it right for them- 
selves, their organizations, and the world. A tall order, you might 
think. I do not deny conflicts of interest or trade-offs in this book. 
Some games are zero-sum, and your gain will indeed be my loss. 
But not every game ends with a winner and a loser. Many games 
are positive sum, and here behavioral design is less like playing 
chess and more like dancing. We can improve girls’ health, edu- 
cation, and opportunities in India without harming those of boys. 
And we can select job candidates in organizations across the world 
based on individual performance rather than group stereotypes, 
increasing both efficiency and equality. 

How can we know that a particular design is effective? We 
can try different strategies and measure their impact. We can ex- 
amine the effectiveness of behavioral design much like we eval- 
uate the impact of a new drug, running a clinical trial in which 
people, schools, or even villages are randomly assigned to treat- 
ment or control groups. The goal of random assignment is to create 


groups that are as identical as possible, so that any change in be- 
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havior can be attributed to the “treatment.” Indeed, much of the 
evidence discussed in this book will be based on such randomized 
controlled trials, allowing us to create a causal pathway from the 
design intervention to the outcome. 

Thankfully, experimentation is becoming increasingly popular. 
More and more governments are designing policy interventions 
in collaboration with social science researchers, allowing them to 
evaluate their impact. Corporations are using advanced technolo- 
gies and social media to test different marketing strategies and 
human resource practices. And nongovernmental organizations are 
running scientifically valid experiments to explore how to decrease 
homelessness or recidivism most effectively. Still, we should do 
more. At all levels, we need to create learning environments where 
people are encouraged to try out something new, possibly fail, and 
then learn from it. 

This fear of trying new things and failing is a real constraint. 
It is also the one that I had underestimated most. Learning is my 
business and, naively, I expected everyone to be keen on uncov- 
ering past mistakes and improving their decision making. How- 
ever, in some organizations, acknowledging past errors is risky. 
Thus, while the CEO or the president might be enthusiastic 
about discovering mistakes and piloting a new idea, managers at 
all levels might well feel threatened. To circumvent this, govern- 
ments and corporations must create safe spaces for experimenta- 
tion where mistakes are taken as an opportunity to learn. 

In this book, I offer dozens of opportunities to try something 
new. These interventions mostly relate to gender, but sometimes I 
draw on research examining how to promote equality for other 
traditionally disadvantaged groups. Some of the same design fea- 


tures that level the playing field between men and women can 
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also inform our thinking about other groups. But while we 
should learn from each other, from research on race in the United 
States or castes in India, we need to be mindful that findings do 
not automatically generalize. Rather, evidence from one field 
should serve as an invitation to experiment with similar design 
features in another. 

Despite media attention paid to general issues of race and 
gender, we still know relatively little about the intersection be- 
tween different social categories—for example, to what degree 
evidence on white women also applies to African, Asian, His- 
panic, or Native American women. Similarly, the research on fal- 
tering academic achievement among boys and men, and what to 
do about it, is relatively young. A series on gender, education, and 
work in the Economist in the spring of 2015 highlights the chal- 
lenges poorly educated men in the United States and elsewhere 
face. They are falling behind not only in school but also in work 
and society more generally. The series calls for a “change in cul- 
tural attitudes”: “Men need to understand that traditional manual 
jobs are not coming back, and that they can be nurses or hair- 
dressers without losing their masculinity.” "® 

Bias hurts counterstereotypical individuals across gender, race, 
class, ethnicity, nationality, or caste. Consider this. Simulations 
show that even a tiny bias in performance evaluations can lead to 
huge disparities in representation at the highest levels. Assuming 
the typical corporate pyramid structure where only a few make it 
to the top, and holding everything else constant, one simulation 
found that a bias accounting for only 1 percent of the variance in 
evaluation scores led to only 35 percent of the discriminated- 
against group being represented at the top. Without the bias, each 
group would have held 50 percent of these seats." 
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Going with your gut can have real effects. The first part of 
the book explores this further. It helps us better understand the 
problem—why gender bias is so prevalent and why it is hard to 
overcome by training alone. It explores approaches focused on 
de-biasing mindsets through diversity training and on helping 
women navigate the system, compete more effectively, negotiate 
more assertively, and lead more strategically. Women need to know 
how and when to “lean in” as Sheryl Sandberg eloquently de- 
scribed in her book. But a review of women’s empowerment ini- 
tiatives suggests that women will not be able to do it alone." 

The remainder of the book focuses on the solutions behavioral 
design offers. The second part introduces new designs for talent 
management. It is devoted to the importance of evidence, con- 
tinuing the theme of experimentation but also arguing for im- 
proved data collection by gender and the use of big data. One of 
the more recent applications of big data has been in the area of 
people analytics. Generally, people analytics argues that we can learn 
more about, say, a given job applicant’s likelihood of leaving the 
firm within the first year by analyzing the characteristics of cur- 
rent leavers and stayers than by clever tests or intricate interviews 
with the candidate. Replacing intuition, informal networks, and 
traditional rules of thumb with quantifiable data and rigorous 
analysis is a first step toward overcoming gender bias. Successful 
for-profit and not-for-profit organizations such as Credit Suisse, 
Goldman Sachs, Google, LinkedIn, Microsoft, and Teach for 
America increasingly run their HR departments like they run their 
finance or marketing departments, based on evidence. Some now 
refer to them as “people analytics departments.” ° 

We also need to scrutinize the messages we send to people who 
consider joining our organizations. Do we attract the right or the 
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wrong ones? Who opts in and who opts out? Is there gender 
bias in how we advertise job openings and describe the qualifica- 
tions and characteristics we look for in a future employee? Do 
schools and universities encourage a broad set of applications or 
do we, consciously or unconsciously, send messages that deter cer- 
tain groups of people from applying? Insights into gender differ- 
ences in preferences for, say, competition or uncertainty, and 
self-stereotypes regarding aptitude for certain subjects, disciplines, 
or jobs help shape the signals we send, increasing the chances that 
people perceive the opening as an invitation to apply. 

Much more can be done. The third part of the book dissects 
the environments in which people live, learn, and work for unin- 
tended biases. Are the portraits that hang in the hallways of your 
organizations only of past male leaders? Know that this is im- 
pacting what employees or students believe possible for them- 
selves. Stereotypes can be activated by the most subtle cues, in- 
cluding whether or not test-takers are asked to check off boxes 
indicating their sex or race before taking a test. Stereotypes pre- 
scribing that Asians outperform whites in math, and girls do better 
than boys in reading and writing can become self-fulfilling 
prophecies—unless we de-bias how we do things. 

Finally, in addition to redesigning how we manage talent and 
craft school and work environments, we can also apply behavioral 
insights to make diversity work better. The fourth part of the 
book shows that contact with other social groups can change ste- 
reotypical beliefs and help people collaborate across groups. But 
not all groups are created equally. Having a “critical mass” of every 
subgroup represented in a team has been shown to be crucial to 
team success. Success is also promoted by the use of specific de- 


sign principles defining the rules of engagement or decision- 
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making. By choosing the “right numbers” and the “right proce- 
dures,” you can help teams perform better. 

These are some of the tools and techniques, often low-hanging 
fruit, behavioral design offers to improve our classrooms and board- 
rooms, tests and performance evaluations, hiring and promo- 
tions, and policy- and decision-making. Based on evidence from 
experimental studies in most cases, this book shows that small 
changes can have surprising eftects. Big data improves our under- 
standing of what is broken and needs fixing, blind or comparative 
evaluation procedures help us hire the best instead of those who 
look the part, and role models shape what people think is possible. 
Building on what works, behavioral design creates better and fairer 
organizations and societies. It will not solve all our gender-related 
problems, but it will move the needle, and often at shockingly low 
cost and high speed. 


Part One 


THE PROBLEM 


Unconscious Bias Is Everywhere 


Meet Howard Roizen, a venture capitalist, former entrepreneur, 
and proficient networker. A case study taught at many business 
schools describes how he became a power player in Silicon Valley. 
He co-founded a very successful tech company, then became an 
executive at Apple, subsequently turning his attention to venture 
capitalism. Most recently, he became a member of the boards of 
directors of several prestigious companies. He is a friend of Bill 
Gates and was close to Steve Jobs. He maintains one of the most 
extensive networks in Silicon Valley. 

After studying the case, students were asked to evaluate How- 
ard’s performance. They rated him as highly competent and ef- 
fective. They also said that they liked him and would be willing 
to hire him or work with him. 

But Howard does not actually exist. His real name is Heidi. 
He is a woman. And when studying the absolutely identical case 
with the protagonist being female, students find Heidi as compe- 
tent and effective as Howard, but they no longer like or want to 


work with this successful entrepreneur and venture capitalist. 
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My friend Kathleen McGinn of Harvard Business School orig- 
inally wrote the case study about Heidi Roizen in 2000 to high- 
light the steps taken by one successful entrepreneur to build and 
leverage personal and professional networks. A few years later, I 
encountered it again in a research seminar. Two professors had 
given half of their students McGinn’s original case, which accu- 
rately identified Heidi, and the other half the same case, but with 
“Howard” swapped in for Heidi. This allowed them to compare 
how students felt about Heidi and Howard.! 

Many business schools have since run the experiment, using it 
as a pedagogical tool to help their MBA students experience gender 
bias. Afterward, the students realize that the prototypical leader 
in their minds is male. Heidi does not look or act the part: she 
cannot be competent and likable at the same time. What is cele- 
brated as entrepreneurship, self-confidence, and vision in a man is 
perceived as arrogance and self-promotion in a woman. 

Women can’t win. If they conform to the feminine stereotype 
of nurture and care for others, they tend to be liked but not re- 
spected. Dozens of studies have now demonstrated that women 
face a trade-off between competence and likability. Women in ste- 
reotypically male domains encounter backlash at every juncture: 
when getting hired, compensated, and promoted. Psychologists 
believe that these negative reactions are due to a clash between 
our stereotypical perceptions of what women are or should be like 
(their gender roles), and the qualities we think are necessary to per- 
form a typically male job. If women like Heidi demonstrate that 
they can do a “man’s job,” they no longer fit our mental model of 
the “ideal woman.” They violate norms, and people do not find 
norm violators appealing. Put differently, women who violate 


: : 2 
norms pay a social price.“ 
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Most studies on the topic have been run with white men and 
women, mostly in the United States. We know little about whether 
the same dynamics apply to other races and ethnicities. What we 
do know is typically based on small samples, for the experiments 
have not been replicated widely. Thus we need to interpret them 
with care. While there exists no comprehensive study of whether 
the agency penalty established for white, female Americans also 
apples to African, Asian, Hispanic, and Native Americans, Robert 
Livingston and colleagues examined the question for African 
Americans. They found that black women are considered neither 
prototypical women nor prototypical blacks. Does this buffer them 
from some of the gender stereotypes white women experience? 
Indeed, experimental evidence showed that black women did not 
experience the same kind of backlash as white women when they 
expressed dominance rather than communality. In contrast, domi- 
nant African American men were penalized while white American 
men were not. If African American men are perceived as nonthreat- 
ening, they benefit. Physical features expressing warmth and defer- 
ence have been shown to be an advantage for black male CEOs, but 
these same attributes hurt white male CEOs in the United States. 

These findings are in contrast to models of double jeopardy, 
assuming that people with multiple “subordinate identities,” for 
example, African American women, are subject to more preju- 
dice than those with only one—African American men or white 
women. Identities appear to be not only additive but intersecting 
in ways that current research is starting to uncover. Erica Hall and 
collaborators suggest that a person’s gender profile is composed of 
the “genders” of a person’s sex and race and that we should take 
this gender profile into account to better understand gendered per- 


ceptions of occupational “fit.’”* 
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Gender stereotypes appear to generalize to some degree across 
cultures. For example, many societies differentiate warmth from 
competence when judging social groups; high-status groups are 
stereotypically perceived as competent but lacking in warmth, and 
the male stereotype is associated with the cultural ideal. The per- 
vasiveness of these stereotypes has real effects on how people are 
evaluated.‘ 

Again and again, the pattern that has emerged from experi- 
ments assessing women who are performing stereotypically male 
jobs—an aircraft company’s assistant vice president for sales, for 
example—looks like this: 


e When performance is observable, successful women are 
rated as less likable than men. 

e When performance is ambiguous, successful women are 
rated as less competent than men. 


In the latter case, when evaluators cannot easily measure quality, 
they fill in the blanks with stereotypes. In one recent study, Katy 
Milkman, Modupe Akinola, and Dolly Chugh sent thousands of 
professors at academic institutions across the United States an email 
from a phantom student requesting a ten-minute meeting the fol- 
lowing week to learn more about a doctoral program the professor 
was involved in. The name of the student, however, varied: some 
were self-evidently male, others female; each sounded either white, 
African American, Hispanic, Indian, or Chinese. Almost 70 percent 
of the professors responded, and most agreed to meet with the stu- 
dent. However, they were significantly less likely to respond to a 
student who was not a white male than to a white male student. The 
bias was most pronounced in the field of business administration, 
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with 87 percent of white men but only about 62 percent of all the 
women and students of color combined receiving a response. The 
professor’s own demographic characteristics generally did not 
matter—Hispanic female professors were as likely to favor white 
male students as white male professors were (the one exception being 
Chinese students who requested a meeting with a Chinese pro- 
fessor). Did the professors, likely unconsciously, perceive the white 
men as more competent or more deserving of their attention?” 

Another field experiment is illuminating. A man and a woman 
applied for a laboratory manager position at a university. They 
were otherwise identical and had the same qualifications. Science 
faculty rated the male candidate as significantly more competent 
than the female candidate and were more likely to hire him. The 
faculty’s pre-existing bias against women affected their evaluation. 
In a further exploration of gender biases associated with STEM 
fields (science, technology, engineering, and mathematics), re- 
searchers had people hire candidates to perform a specific job: a 
well-defined arithmetic task on which both genders perform 
equally well. When the evaluators knew nothing except the can- 
didates’ gender, men were about twice as likely to be hired as 
women. The bias was barely affected, however, when the candi- 
dates were allowed to provide information on their qualifications. 
In keeping with other findings, when the male candidates did so 
they tended to boast, and when the female candidates did so they 
tended to under-report how good they were, behaviors evaluators 
did not take into account. Only information about how well the 
candidates had done on the task in an earlier round helped reduce 
the bias. But even this precise information was unable to elimi- 
nate it completely.® 
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How about men? What if men are evaluated for a counterste- 
reotypical job, such as the assistant vice president of Human Re- 
sources? While this question has been studied less, it appears as if 
men in counterstereotypical roles experience some of the same 
bias-informed dynamics as women do—with one important ex- 
ception: their likability is not affected. Male human resources 
managers may well be evaluated as less competent, but holding a 
stereotypically female position does not make them less likable. 
Women, thus, are in a double bind that men are not. They are per- 
ceived as either likable or competent but not both. 

Biases about whether or not a person fits matter. Being disliked, 
in addition to being extremely unpleasant, can injure, even derail 
your career. Mothers might be even more affected than women 
without children. Much evidence suggests that stereotypical 
perceptions of warmth work against mothers in the labor market. 
Unlikable individuals have been shown to receive worse perfor- 
mance ratings and be deemed less worthy of salary increases or 
promotions than their more likable counterparts. This appears to 
be true for both men and women. But while colleagues have lots 
of reasons to dislike someone, from dishonesty to arrogance, “it is 
only women, not men, for whom a unique propensity toward dis- 
like is created by success in nontraditional work.” This quote, 
from Madeline Heilman, one of the leading researchers in this 
field, can be rephrased more bluntly. Because of our biases, we tend 
to react to successful women much like we react to dishonest men: 
we do not like them and do not want to work with them.’ 

Numerous additional field experiments have been conducted 
in which male and female candidates, otherwise equally qualified, 
have applied for the same jobs, and again and again bias has been 


found to influence outcomes. Whether applying to be waiters or 
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waitresses in the United States, or accountants, engineers, computer 
programmers, or secretaries in the United Kingdom, or financial 
analysts in France, sex-based discrimination has been influential. A 
review of the evidence concludes that both women and men tend 
to be discriminated against in jobs that are associated with and 
dominated by the opposite sex. Men were discriminated against 
when seeking jobs as secretaries and women discriminated against 
when seeking jobs as engineers.® 

While the evidence is still thin, there are some early signs that 
we are starting to see a change in this trend. For a 2013 segment 
of CNN’s Anderson Cooper 360°, the Heidi-Howard study was 
repeated at New York University’s Stern School of Business. 
The students still rated the successful female leader as less trust- 
worthy than her male counterpart, but they no longer liked her 
less. Indeed, they reported being more willing to work for her 
than for the successful man. In 2015, a scientific study reporting a 
reversal of this trend for entry-level jobs in academia appeared in 
the Proceedings of the National Academy of Sciences. Wendy Williams 
and Stephen Ceci found in five hiring experiments—in which 
faculty evaluated profiles of hypothetical male and female can- 
didates applying for jobs as assistant professors in biology, engi- 
neering, economics, and psychology—a substantial pro-female 
bias in all disciplines but economics. Are we starting to harvest 
the fruits of all the work that has been done to equalize the playing 
field in science, technology, engineering, and math—at least at 
the entry level? 

As I write this book, it is too early to tell. Maybe faculty in 
biology, engineering, and psychology are overcompensating for 
past inequities in an effort to move the needle a bit more toward 


gender equality in hiring. Perhaps STEM fields are catching up 
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with other areas where we have seen an increase in gender diver- 
sity at the entry level. Why this is happening in some fields but 
not in my own, economics, remains unclear.’ 

Another field where gender diversity at the entry level has dra- 
matically increased is law. Analyzing panel data from more than 
6,000 lawyers employed by one of the largest law firms in the 
world from 2003 to 2011, Ina Ganguli, Ricardo Hausmann, and 
Martina Viarengo found that an almost equal number of male 
and female lawyers entered at the associate level. However, this 
did not translate into closing the gender gap at the top, where 
only 23 percent of partners were female. Instead, the gender gap 
at the most highly ranked position, partner, was strongly related 
to a gender gap in promotions. In their book Through the Laby- 
rinth, Alice Eagly and Linda Carli discuss how gender stereotypes 
constrain women’s access to leadership roles. In particular, biases 
affected evaluations of women vying for the very top positions, or 
what is commonly known as the glass ceiling effect. In contrast to 
the entry level, there is no closure of the gender gap at the top in 
sight.!° 

Ganguli and colleagues’ research shows why. The lawyers in 
their study worked for one firm in thirty-three different offices 
located in twenty-three countries on four continents. The three 
researchers had access to an almost unprecedented amount of 
individual-level data, including wages, bonuses, performance 
appraisals, educational background, employment status, career 
trajectory and leaves, as well as demographic characteristics. Pro- 
motion gaps persisted after controlling for all of these variables, 
including the fact that men and women left the firm at equal 
rates. But the degree of promotion gaps varied across countries— 
even though, at least in theory, the firm was guided by the same set 
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of policies and practices. Female lawyers found it hardest to climb 
up the career ladder in countries where stereotypical thinking 
about gender roles was most pronounced, such as in the Russian 
Federation, Singapore, or Thailand, based on both survey data 
from the World Values Survey and the World Economic Forum 
Global Gender Gap Index as well as data concerning each coun- 
try’s gender gap in political representation. The playing field for 
female lawyers was more even in countries such as Belgium, the 
Netherlands, or Sweden." 

Another study examined the performance evaluation bias 
toward highly successful women in a number of contexts, in- 
cluding commanding officers in the US military. It turns out 
the evaluating officers gave female subordinates whose pay grades 
were close to their own lower performance scores than they gave to 
male subordinates. The authors identify this phenomenon as gender 
hierarchy threat. Female (but not male) subordinates whose objec- 
tive performance was strong were punished by male (but not 
female) evaluators for violating gender norms.” 

Let’s take stock: the gender gap in leadership is real; its rela- 
tionship to the gender gap in promotions is real; a connection ex- 
ists between the promotion gap and the extent of stereotypical 
attitudes. These dynamics have been demonstrated in various 
contexts and countries, but too little is known to determine to 
what extent they generalize from whites to all other demo- 
graphic groups. The stereotypes about “leadership fit”—or lack 
thereof—are hardly based on evidence. There simply are not 
enough women in positions of leadership to draw reliable infer- 
ences. Interestingly enough, when employees who prefer male 
leaders in theory are exposed to female leaders, they do not give 


them lower ratings, a large 2011 survey finds. The bias against 
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female upper-level managers is in our heads—or, to quote from 
one of my favorite textbooks on organizational economics by two 
Stanford economists, Paul Milgrom and John Roberts: “even if 
the beliefs are completely groundless, no disconfirming evidence 
ever is generated because women never get a chance to prove the 
beliefs are wrong. Thus, the baseless beliefs survive, and with them, 
the unjustified discrimination.” 

Our minds are not well equipped to deal with what is com- 
monly known as survivor bias. We constantly make inferences 
based on biased samples. The archetypal example is a study of 
World War II bombers. With the hope of making them safer, the 
planes were examined for weaknesses after they returned from 
their bombing runs. But, of course, these were the wrong planes 
to examine. They were just the ones that made it back. To learn 
about weaknesses, one would have had to examine all the 
planes—or, as Abraham Wald, a mathematician at Columbia 
University, concluded at the time, the scientists should not have 
looked for the bullet holes the returning planes had, but for the 
bullet holes they did not have. It was these other “holes” that de- 
termined whether a plane made it back or not." 

This does not sound intuitive—and it is not. I regularly teach 
a case study at Harvard on sample bias based on the fatal launch of 
the shuttle Challenger in 1986. The students are asked to reach a 
decision structurally similar to what NASA’s engineers faced, al- 
though in a different context where lives are not at stake. They 
are presented with the same data points on past successes and fail- 
ures that the engineers focused on and encouraged to seek more 
information. Only very few do. Rather, they base their decision 
on a biased sample. In doing so they experience the bias that has 
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been identified as one of the core mistakes that led to the launch 
of the shuttle Challenger.” 

Although bias is prevalent, it is important to note that not all 
intuitive judgments automatically are inaccurate. Some intuitive 
judgments are based on accurate stereotypes that reflect the true 
distribution of a given group’s characteristics. Consider the fol- 
lowing story, a popular mental sleight of hand. A father and his 
son are in a car accident. The father does not survive, and the son 
is badly injured. An ambulance takes the son to the hospital, 
where the surgeon cries out: “I cannot operate because this boy is 
my son!” 

For most of us, our intuitive reaction is, at first, confusion. 
Upon reflection, we realize that, of course, this is quite possible. 
The surgeon is the boy’s mother. 

About one third of all surgeons in the United States are female, 
so it is not that surprising that when we think of “surgeon” we 
also first think “man.” Economists refer to this as statistical discrimi- 
nation. People base their assessment of an individual person 
on group averages. They do this intuitively, as in the above ex- 
ample. They also do this to help them in situations where they do 
not have complete information about an individual’s relevant 
characteristics. 

In a field experiment demonstrating the existence of statistical 
discrimination, researchers sent in different buyers to negotiate 
with salespersons for a secondhand car. The sellers demanded a 
significantly higher initial price when the buyer was a woman or 
African American than when the buyer was a man or white. It 
appears that salespersons took advantage of the fact that on average 
female and African American car buyers have been found to be 
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less well informed of the price ofa car. Because sellers were aware 
of this, they statistically discriminated against these prospective 
buyers. The shadow statistical discrimination casts can be long and 
hard to escape. After being presented with the salesperson’s initial 
price, even well-informed African Americans and women could 
not close the gap and reach the price the salesperson offered to 
white men in negotiations.'° 

There are obvious practical lessons to be learned from evidence 
of statistical discrimination. Unsurprisingly, I advise all of my stu- 
dents, but particularly women and people of color, to arrive ex- 
tremely well prepared to any secondhand car dealerships (and, of 
course, to any negotiation!). Not only should they do their home- 
work and know how much a car with certain attributes is worth, 
but they should also have an understanding of how to trade off 
fewer years against less mileage. And most importantly, they need 
to make sure the salesperson is aware of their knowledge and ex- 
pertise before an initial price is presented to them. 

We clearly use group characteristics all the time when judging 
individuals. These judgments have real consequences—no less real 
than those resulting from unconscious biases. For example, the 
labor market penalizes women but rewards men for having 
children. The child salary penalty is a well-known statistical fact for 
women, as is the child salary premium for men. Some of this is due 
to statistical discrimination, with employers expecting that mothers 
will be more likely than fathers to cut back on their hours and, 
maybe, leave the workforce altogether. Their assessment is accu- 
rate. In academia, for example, the large majority of faculty taking 
parental leave are female. In one study surveying married tenure- 
track parents of under-two-year-olds, 69 percent of the mothers 


and 12 percent of the fathers chose to take parental leave. Faculty 
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who take leave have been found to be paid less. Accordingly, they 
tend to engage in bias avoidance by not taking parental leave if at 
all possible.!” 

Of course, statistical discrimination is not limited to gender. 
Racial profiling is hotly debated in the United States. Law enforce- 
ment has been reported to suspect an individual of breaking the 
law solely based on his or her race and ethnicity. To what degree 
a society wants to allow the use of demographic characteristics to 
prejudge people is a political, even a moral decision. As a society, 
we want our systems, laws, and organizational procedures to re- 
flect our values. Accordingly, in many countries it is considered 
immoral and often illegal to base hiring decisions, for example, 
on information the employer has about the group a person belongs 
to. Equality, whether between men and women or between people 
of different racial, ethnic, national, or other demographic back- 
grounds, is a moral decision first.'* 

Consider, too, that much of what people believe to be statis- 
tical discrimination actually is not. Assessing the usefulness of a 
stereotype, say, in forming an opinion about a person’s trustwor- 
thiness or future performance is a cognitively demanding task. 
Many stereotypes were never accurate to begin with, and some 
lost their accuracy over time. Most people still believe women to 
be worse at mathematics than men. However, the evidence is much 
more nuanced and varies by country and population. Indeed, in 
recent years, the gender gap in math has reversed in several coun- 
tries, with girls outperforming boys in school on average. Stereo- 
types have not been nearly so quick to reverse." 

Evidence from behavioral decision research suggests that we 
do not update our stereotypes accordingly. In fact, when we think 


about a group, we do not even focus on the average, as suggested 
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by statistical discrimination, but rather we recall the group’s most 
distinctive types. We are influenced by salient representatives of 
a given category. Arguing this point, a recent article provides a 
helpful illustration: Pause now and think briefly of people living 
in Florida. Whom did you think of? 

If you are like most others, you fell prey to the stereotype that 
most Florida residents are elderly. It turns out that this is not true 
at all. In 2013, 82 percent of Florida’s residents were younger than 
sixty-five—only slightly fewer than in the overall population of 
the United States (where about 86 percent were under sixty-five). 
At the same time, it is true that among the country’s older popu- 
lations, more live in Florida than elsewhere in the United States. 
Thus, the relative frequency of the elderly living in Florida is 
higher than in the comparison group, the rest of the United States. 
Such stereotypes are based on representative characteristics of a 
group, not on its average ones. If you were among the majority and 
thought that most Florida residents were elderly, perhaps conjuring 
up a grandparent in Tampa, you succumbed to a known bias, the 
representativeness heuristic.”° 

When Florida calls to mind retirement communities and the 
aged, your System 1 isin control. In his 2011 masterpiece, Thinking, 
Fast and Slow, Daniel Kahneman, a psychologist and 2002 Nobel 
Laureate in Economics, helps us understand how this works. He 
introduces the reader to two modes of thinking, System 1 and 
System 2, a distinction often used in psychology. Our intuitive 
System 1 runs automatically, without much effort or control. It 
assesses information quickly. Some might say it makes snap judg- 
ments and employs a number of mechanisms to deal with life’s 
complexity. It uses heuristics, or rules of thumb, to interpret the 
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world and relies on categories represented by archetypes. The de- 
liberative System 2, in contrast, is based on conscious reasoning, 
requires effort, and is controlled. It is slower than System 1 and 
capable of abstract analysis and rule-based thinking. When we 
think of nurses, teachers, and engineers, System 1 supplies a repre- 
sentation of a member of this category who qualifies as “normal” 
or “typical.” We employ a stereotype when judging others. System 
1 is satisfied by what it observes in the moment, a process that 
Kahneman dubs as WYSIATI, What You See Is All There Is. 
System 1 has a need for internal consistency and confirmation of 
previously held beliefs, and thus finds it hard to update and incor- 
porate new information. 

Much of the psychologist Susan Fiske’s work has been devoted 
to better understanding how exactly this process works. She and 
her colleagues have developed a “continuum model of impres- 


? 


sion formation,” a framework that helps us understand how we 
form impressions of people. Most of us form first impressions 
based on social categories, such as sex, race, age, or social class. 
We then work to confirm our initial category-based assessments, 
sometimes re-categorizing if the available information no longer 
fits. Eventually, we integrate a person’s individual attributes if 
needed. She argues that “social categorization is a necessary, if un- 
fortunate, byproduct of our cognitive makeup.” Matching people 
to existing social categories helps us quickly make sense of the 
world, sizing up and classifying people based on our experiences. 
In short, we are economizing our cognitive effort. 
Characteristics that manifest themselves in physical appearance 
tend to dominate nonvisual cues. The color of your skin and the 
cut of your hair, for example, matter more than your accent. 
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Among various visual cues, the one that stands out because of its 
surrounding environment is most likely to inform a category-based 
impression. Because the lone female director on a corporate board 
and the sole male elementary school teacher stand out, we more 
quickly place them in categories. But we do not need marked 
physical traits or outliers in their environments to convince us that 
someone belongs in a particular category. Even when people (or 
objects, for that matter) were arbitrarily given a random label (some 
individuals marked as in the “purple” group and others marked as 
in the “orange” group), observers started to see similarities among 
members of the “purple” group and among members of the “or- 
ange” group. They also observed differences between members of 
“purple” and “orange.” In the most extreme case, people perceived 
some as the “ingroup,” or similar to themselves, and others as 
an “outgroup,” and treated each accordingly by allocating more 
rewards to ingroup members. 

Depressingly, unlearning is basically impossible. Once an ini- 
tial category-based assessment has been made, thereafter new in- 
formation is interpreted in a biased way, favoring consistency with 
the initial impression, a process known as confirmatory categori- 
zation. My colleague Mike Norton of Harvard Business School 
and others show in a number of experiments with how much cre- 
ativity our minds go about doing this. People were asked to eval- 
uate job candidates for a stereotypically male job in a construc- 
tion company. They were informed that both experience in the 
industry and educational background were important for the job. 
Among the top two candidates, one had more experience and the 
other more education. Given that it was a stereotypically male job, 
evaluators generally preferred male candidates. But what is more, 
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they justified their decisions, made on biased social categories, by 
using information on experience and education selectively: when 
the male candidate had more experience but less education than 
the female candidate, they said that they valued experience more 
than education. When the male candidate had more education and 
less experience, they inflated the relative importance of education. 
Similar ex-post justifications have been demonstrated for race as 
well.?! 

A typical response to data like these is to conclude that they 
don’t apply to you. Sure, participants in these studies demonstrated 
these biases and fell prey to confirmation categorization, but just 
as a majority of us are sure we're better than average drivers, many 
of us imagine we'd do better. To which I say: have a look at the 
task in the following illustration, and please name the color of the 
words out loud. Measure how long it takes to recite the whole list. 


GRAY BLACK WHWE WHWE BLACK 


GRAY BLACK GRAY BLACK WHE 


BLACK GRAY BLACK BLACK GRAY 
Wht WHUE GRAY WalYE GRAY 


GRAY BLACK WHE WHE BLACK 


The Stroop test, part 1 


Now, name the color of the words out loud in the test that 
follows. 
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GRAY BLACK WHITE WHITE BLACK 


GRAY BLACK GRAY [BLACK WHITE 


BLACK GRAY BILACK BLACK GRAY 


WHITE WHITE GRAY WHITE GRAY 
GRAY BLACK WHITE WHITE BLACK 


The Stroop test, part 2 


Undoubtedly, you will have noticed that it is much harder 
the second time around. When the name of the color does not 
match the color of the type, the brain stumbles over itself. The 
effect is even more pronounced with red, blue, green, and so on. 
Your mind cannot help but read the word first; then it automati- 
cally determines the semantic meaning of what it sees: when you 
read black, you think black. That’s your System 1 at work, and it 
works very nicely in the first example. However, in the second 
example your System 2 has to come to System 1’s aid to disen- 
tangle the letters from the colors. This is not a particularly de- 
manding cognitive task, but it takes just a bit longer to make 
unusual associations than congruent ones. When the word is “gray,” 
it should look gray. When it does not, our minds need to do some 
work. 

I use this illustration, the Stroop test, with more colors than 
the print in this book allows whenever I teach about bias—and 
take pleasure in telling the audience that when my son was four 
years old he had no trouble beating them at this task. The expla- 
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nation is simple: he knew his colors but could not yet read. Alas, 
we cannot leave de-biasing the world to four-year-olds. 

The psychologists Mahzarin Banaji and Anthony Greenwald 
and their colleagues have arguably done the most to uncover the 
unconscious decision processes leading us to biased judgments 
about others and about ourselves. Much of that has been due to 
the IAT, the Implicit Association Test, which, building on the 
Stroop test, Greenwald created in 1994 as a new tool to measure 
what is going on in our minds. 

The IAT asks people to make connections between words of 
different categories. System 1 makes such judgments automatically, 
using what Banaji and Greenwald, in their illuminating book 
Blindspot, refer to as “bits of knowledge about social groups.” Im- 
plicit bias is measured by how quickly people make associations. 
For example, are they as fast to associate John with reading and 
writing or Susan with mathematics, or is it easier for people to con- 
nect John with mathematics and Susan with reading and writing? 
Literally hundreds of thousands of people have now taken the 
IAT online and have learned often uncomfortable truths about 
themselves—that they are implicitly sexist and racist and biased 
against people with certain looks, sizes, religions, and so on. You 
should take a moment and go online and test yourself at https:// 
implicit.-harvard.edu/, You may find that Susan is more associated 
with reading and writing and John with mathematics, and perhaps 
some worse associations of which you were not aware. 

People who make more gender-stereotypic associations on the 
IAT have been found to laugh more at sexist jokes. But laughing 
is not all. The racial bias measured by the IAT predicted discrim- 


ination in simulated hiring situations (preferring white applicants), 
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physician behavior (being more likely to recommend the optimal 
treatment to whites), and voting (being more likely to vote for 
John McCain than Barack Obama in the US 2008 presidential 
elections), among others. In addition, in the STEM study discussed 
earlier in this chapter in which people were hired to perform an 
arithmetic task, the IAT accounted for evaluators’ initial average 
bias against women by measuring their implicit stereotypes. What 
is more, the evaluators’ performance on the IAT predicted the 
degree to which evaluators updated their beliefs when more in- 
formation was made available to them. The more biased the evalu- 
ators, the less often they were able to take individual performance 
data into account. 

It is crucial to appreciate that most of this happens uncon- 
sciously. This makes quite alarming a finding by Eric Kandel, a 
neuroscientist at Columbia University and the 2000 Nobel Lau- 
reate in Physiology or Medicine: Kandel guesses that 80 to 90 
percent of the mind works unconsciously. 

In addition to perceiving others in a stereotyped way, we also 
apply stereotypes to ourselves. Many women who take the IAT 
to explore gender biases experience the power of the unconscious 
mind. It is the truly rare woman who publicly subscribes to the 
belief that men are naturally superior at pursuing a career. But 
when women take the IAT, they are shocked to learn that they 
too instinctively associate careers with men and family with 
women. Such automatic gender stereotypes may lead to self- 
stereotyping that holds women back without their (conscious) 
knowledge.” 

A fascinating experiment entitled “The Emergence of Male 
Leadership in Competitive Environments” supports the notion 


that stereotyping of the self and others interact in intricate ways. 
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A group of researchers showed that female MBA students at the 
University of Chicago Booth School of Business were selected as 
leaders much less often than their established skills warranted— 
because women as well as men conformed to what we expect of 
their genders. Male MBA students behaved in a more overconfi- 
dent manner than their female classmates and were thus more 
likely to be chosen as leaders. In the experiment, before having to 
select a leader, everyone participated in a math task, learned about 
how well he or she did, and was paid based on his or her indi- 
vidual performance. Fifteen months later, the students were 
randomly assigned to groups, and each group selected a leader 
who would perform another math task on the group’s behalf. His 
or her performance would determine every group member’s 
earnings. 

Group members had five minutes to consult each other and 
decide who would be their representative to compete with other 
group leaders on their behalf. People could talk freely but had to 
state and record on a piece of paper how well each of them thought 
they would perform in the upcoming math task. It turns out the 
men were more optimistic about their future performance than 
their female counterparts because they misremembered how well 
they had done in the past by a larger extent than women. Men had 
an inflated recollection of their past performance, overestimating it 
by about 30 percent; women also remembered their past perfor- 
mance as higher than it actually was, but by only about 14 percent. 
Based on these self-assessments, men were more likely to be chosen 
by their groups. Choosing the wrong leader had consequences. As 
you might expect, actual past performance was a much better 
predictor of future performance than the students’ memories, 


most especially the male students’ inflated recollection.” 
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What to do? If left unaddressed, we clearly have a problem. We 
do not take advantage of 100 percent of our talent pool nor do we 
match the right people with the right jobs and positions. But, of 
course, the problem is far greater than that. Most societies across 
the globe take pride in their belief that they provide equal oppor- 
tunity for all. A wealth of research data stretching back decades 
proves this is not the case. That’s the bad news. 

The good news is that there is a fair amount we can do and 
can do quickly. We cannot rectify every aspect of gender in- 
equality, but we can address many. Designing equality is feasible, 
practical and, as the orchestras evaluating musicians behind a screen 
demonstrate, already under way. But before we turn our atten- 
tion to the problems on which we can make progress, we should 
acknowledge the limitations of behavioral design. It will not solve 
some of the very worst human rights violations women are the 
victims of, including sexual violence and human trafficking. Behav- 
ioral science does influence my thinking when serving on Harvard’s 
Task Force on the Prevention of Sexual Assault, and I hope it helps 
us change gender relations on university campuses—but some atroc- 
ities require a hammer instead of a nudge. Nick Kristof and Sheryl 
WuDunn’s important work Half the Sky provides guidance.”* 

There are certainly people who intentionally discriminate 
against women, some of them committing horrible crimes and 
some others deliberately treating women inequitably in the work- 
place. And they will likely keep doing so as long as the benefits 
outweigh the cost. They have a “taste for discrimination,” a term 
coined by Gary Becker, the 1992 Nobel Laureate in Economics. 
Many societies have decided to make it costly for people to in- 
dulge their discriminatory tastes. There is also some empirical 


evidence in support of Becker’s theory that competition can help. 
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Firms have been found to employ a larger share of women in more 
competitive environments than, for example, when they are pro- 
tected. When a company can afford to discriminate against highly 
qualified but otherwise unwanted employees, namely women, 
many do.?° 

Unfortunately, perfect competition hardly exists, and relying 
on it to eliminate taste-based discrimination will not succeed. 
Other hammers will remain important, including laws giving 
everyone equal rights, protecting people from discrimination and 
exploitation, and deterring wrong-doers by making the cost of 
doing so greater than the benefits. 

Because the stakes are high by every measure, let me be clear. 
Far from all gender inequities are the result of unconscious bias, 
which is only one of the culprits unjustly disadvantaging some and 
benefiting others. And behavioral interventions are one instrument 
in our collective toolbox to correct for these injustices. Biases are, 
however, a clear cause of inequality, and behavioral designs can 
accomplish things that hammers cannot. There is no better tool 
in that toolbox to harvest some of the lowest hanging fruit. Women 
should not have to choose between competence and likability, nor 
should organizations and society be deprived of their best talent. 
In an interview conducted after the case study she inspired gained 
attention, Heidi Roizen said, “there were certainly times when I 
would walk into a room or a situation where I did not feel par- 
ticularly welcome. I don’t think beating your head against those 
walls is a very effective approach. I think I learned that pretty 
quickly.” 

No one should have to beat her head against the walls. Let’s 


start redesigning them. 


2 


De-Biasing Minds Is Hard 


Imagine that you are a plaintiff’s trial attorney. Your client has 
been badly injured in a car accident that you allege was caused by 
the faulty repair work of her automobile dealership. Your client’s 
injuries are quite serious—her recovery will take months and she 
will incur substantial medical bills—but you are unsure if you can 
prove the repair shop’s lability. You file suit nevertheless, seeking 
$750,000 in damages. The automobile dealership’s insurance com- 
pany contacts you to see if a negotiated settlement can be reached, 
in the course of which you drop your original demand to $300,000 
and the insurance company counters by offering you $25,000 in 
damages. 

How should you advise your client? Obviously, that decision 
depends on how optimistic you are that you will win in court and 
how high you expect the award to be. You do your homework 
and come up with the following prediction: You expect that you 
have a 60 percent chance of winning and that the jury would award 
your client an estimated $260,000. Multiplying the two gives you 


the expected value of going to court (for simplicity we will not 
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take other costs, such as legal fees, into consideration and assume 
that you and your client are risk neutral). It is $156,000. Thus, you 
should accept any settlement offer higher than $156,000 and go 
to court otherwise. 

This is what my Harvard students conclude, on average, after 
reading a detailed fifteen-page case description. Or, at least, this 
is what the half of them tasked with representing the plaintiff 
decides. The other half represents the defendant. What do they de- 
cide to do? 

The defendant, of course, has to decide what settlement amount 
to offer and when it is in their interest to go to court. My students 
representing the defendant, on average, predict that the plaintiff 
has about a 40 percent chance of winning a jury award of about 
$180,000. For them, the expected cost of going to court is $72,000. 
Put differently, a risk-neutral defendant should make settlement 
offers up to $72,000 and take the case to court otherwise. 

How is it possible that the two parties come up with such vastly 
different estimates? In real life, you might conclude both sides have 
different pieces of information available to them. But in this exer- 
cise plaintiffs and defendants were given identical information as 
to the facts, witness statements, and the law. Any differences in 
their estimates must be based on how they interpreted the infor- 
mation provided. 

In their differing conclusions you see their biased assessments 
of the same information. The plaintiff’s attorneys paid particular 
attention to the information favoring their client’s claims while 
the defendant’s representatives focused on facts supporting their 
side. What is more, this biased assimilation of information affected 
their judgments despite the fact that they were asked explicitly to 


assume that they “were a neutral outside observer familiar only with 
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the accident facts, witness statements and the law.” They were un- 
able to do so. Once they knew who they represented, they could 
no longer assess information objectively but fell prey to a self- 
serving bias. 

Based on the numbers I gave you—the plaintiff seeking 
$156,000 to stay out of court, the defendant offering $72,000 to 
stay out of court—it is doubtful that the two parties will reach 
agreement. However, what I presented to you are average num- 
bers, based on more than 900 students who have participated in 
this exercise. Over the years there have been less optimistic plain- 
tiffs and more pessimistic defendants who managed to settle out 
of court. In fact, a bit more than half did so for an average settle- 
ment value of about $130,000. The real case the exercise is based 
on, by the way, settled out of court for $175,000 after twenty 
months of negotiations.! 

The price of succumbing to self-serving bias can be high. It 
tends to prolong disputes, make the parties more hostile, and lead to 
impasses or costly resolutions in court. Wouldn’t it be better if 
we could de-bias people before they begin negotiating and help 
them form more accurate judgments? What if we could assist 
people in overcoming unconscious biases, leaving stereotypical 
thinking behind and becoming less prejudiced? This isn’t an orig- 
inal hope. Many organizations, and if you are employed, the odds 
are good your place of work is among them, run diversity training 
programs with exactly these objectives in mind. Sadly, there is 
little evidence to suggest that they work. 

Two researchers at Carnegie Mellon University, Linda Babcock 
and George Loewenstein, were determined to find a cure for self- 
serving bias. They had observed Pennsylvania teachers’ unions and 


school boards fall prey to such biases in salary negotiations. Ahead 


De-Biasing Minds Is Hard 47 


of negotiations, teachers and boards sought districts to compare 
themselves to—in a self-serving way. Babcock and Loewenstein’s 
analysis showed that the teachers found substantially higher refer- 
ence points than the school boards did. This inability to agree on 
appropriate comparable districts and salaries went hand in hand 
with increased strike activity. To test the impact of various inter- 
ventions that might defuse such negotiations, the researchers 
turned to the laboratory. 

In a series of experiments on legal disputes similar to the one 
that opens this chapter, Babcock and Loewenstein confirmed that 
self-serving biases were prevalent and, more sobering, found 
they are very hard to overcome. Typically, plaintiffs’ predictions 
of awards were about twice as large as what the defendants ex- 
pected, and they were substantially more optimistic about winning. 
What is more, self-serving biases did not go away with expertise. 
Lawyers and judges were as biased as the teachers, students, or 
inexperienced subjects tested in the lab—and nobody was aware 
of their biases. Indeed, much research suggests that experience 
alone is insufficient to correct biases. 

To help people come up with more reasonable predictions, the 
researchers experimented with a number of de-biasing techniques. 
For example, before they formed a judgment, some participants 
were informed of the bias and its impact. There is evidence sug- 
gesting that bias awareness can help overcome the need to con- 
form to stereotypes by triggering what psychologists refer to 
as stereotype reactance. Being made aware of the “leader = man” or 
“negotiator=man” stereotype, women have been found, for 
example, to do better in negotiations than when stereotypes are 
activated only implicitly. Alas, awareness did not affect people’s 


own predictions in the legal case—they remained as biased as 
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before. Interestingly, however, bias awareness did improve people’s 
guesses about their opponent’s predictions. Apparently, when in- 
formed of the bias, people assumed that their counterparts would 
be affected by it but believed they were capable of assessing the 
information objectively. It may be that bias awareness works when 
the bias can be attributed to others, similar to stereotype reactance 
where people respond to others’ biases.” 

People are quite ready to see biases in others, but they over- 
look the very same biases in themselves. In one study, participants 
were asked to rate the appearance, accent, and mannerisms of indi- 
viduals introduced to them as math instructors. Before being rated, 
the instructors had to answer a number of questions. For the control 
group, the instructors answered the questions in a cold manner; the 
treatment group saw the same instructors answer the questions 


‘ 


warmly. Accordingly, the study participants rated the “warm” 
instructors as more likable than the “cold” instructors. Finally, the 
participants were presented with three variables—appearance, 
mannerisms, and accent—and asked to rate the instructors based 
on them. Sure enough, they rated the “cold” instructor’s accent, 
appearance, and mannerisms as more disturbing. The very same 
instructor was given higher ratings when he came across as lik- 
able. However, evaluators were not aware of being influenced by 
an instructor’s likability. They did not state, “Because I liked him, 
I rated his appearance more positively.” They simply asserted their 
impression of his appearance, believing inaccurately that whether 
or not they liked someone held no influence. 

People routinely fall prey to the halo effect. A term coined by 
the psychologist Edward Thorndike, this effect occurs when an 
initial positive impression of a person impacts how favorably the 


person is subsequently perceived. In this experiment, participants 
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were unaware that perceiving an instructor as likable affected how 
they interpreted his appearance, accent, and mannerisms, nor did 
they believe it affected their ratings. Halo effects are pervasive and, 
as we will see in later chapters, have been proven to distort our 
views in job interviews. 

What is more, the halo effect did not go away when the evalu- 
ators were instructed to use introspection to make sure their judg- 
ments were unbiased. For example, when considering whether 
the instructor’s warmth might be clouding the participants’ judg- 
ment, say, of the math curriculum he planned to introduce, people 
were quick to come up with stories about why the new math cur- 
riculum was superior on its own merits. Instead of making people 
question their assumptions, introspection turns out to reassure 
people that they have been correct all along and that their con- 
clusions are based on sound reasoning. When asked how suscep- 
tible they think they are to biases or stereotypical judgments, 
study participants conclude routinely that they are less biased than 
the average.’ 

It isn’t just that being made aware of biases doesn’t do the trick. 
It isn’t just that urging introspection about whether your judgments 
might be biased doesn’t work. It turns out that when research par- 
ticipants are asked not to give into their inclination to make ste- 
reotypical judgments, things can backfire. 

In one experiment on unconscious bias, study participants 
taking an Implicit Association Test were instructed to suppress the 
tendency to be more favorable toward flowers than to insects 
and to whites as compared to blacks. They were unable to do so. A 
meta-analysis examining twenty-one studies aimed at reducing 
automatic stereotypes finds that suppression does not work. In 


extreme cases, instructions to resist stereotypes had the opposite 
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effect, making stereotypes more salient and leading to an increase 
in biased judgments. For example, students evaluated older job ap- 
plicants more negatively after watching a diversity training video 
asking them to suppress unfavorable attitudes toward the elderly. 
In addition, when trying to suppress racial bias and avoid referring 
to race in situations where it would have been natural to mention it, 
people perceive the suppressors as more racially biased.* 

The same is true of hindsight bias. Sometimes referred to as the 
“knew-it-all-along effect,” it says that people tend to see the present 
as more predictable than it really was. When the meteorologist 
predicts a 50 percent chance of rain, the drenched commuter de- 
clares he was 100 percent sure it was going to pour. Research on 
overcoming this bias has shown a similar pattern. People cannot 
help but fall prey to it, even after having been taught about the 
bias and being explicitly instructed to avoid it. 

Baruch Fischhoff, an early contributor to this research, argued 
that for de-biasing to have any meaningful impact, it must involve 
at a minimum the following four steps: awareness of the possibility 
of bias; understanding of the direction of the bias; immediate feed- 
back when falling prey to the bias, and a training program with 
regular feedback, analysis, and coaching. 

This, of course, is a tall order. How many of us have our su- 
perego sit on our shoulders to regularly monitor our attitudes and 
behaviors, analyze them for their root causes, and then give feedback 
on what to do about them? Arguably none of us, and certainly not 
all the time. Often, we do not realize that we are biased, and even 
more often we do not receive feedback in time to link a specific 
decision or behavior to our bias. And even if we do, we may well 
not act on the information received. Put bluntly, changing behavior 


means work that the vast majority of us are not motivated to do.’ 
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Yet the $8 billion US corporations alone spend annually on di- 
versity training is spent largely ignorant of this fact. Such training 
sessions are unlikely to change attitudes, let alone behavior, if they 
set out only to make employees aware of their biases. Little is 
known about the effectiveness of diversity training programs, 
which differ widely yet are ubiquitous. Today, most US corporations 
offer some sort of diversity training. Some conduct workshops with 
trained instructors, others employ electronic formats and offer web 
seminars. Some focus on hiring and promotion and offer strate- 
gies to managers on how to avoid discrimination. Other programs 
are open to all employees and are geared toward fostering an 
inclusive culture and work environment. The strongest conclusion, 
drawn from one of the most comprehensive reviews of almost 
1,000 studies, which was conducted by Elizabeth Levy Paluck 
of Princeton University and Donald Green of Yale University, was 
“the dearth of evidence” as to whether they work. “Entire genres 
of prejudice-reduction interventions, including moral education, 
organizational diversity training, advertising, and cultural compe- 
tence in the health and law enforcement professions, have never 
been tested, as well as countless individual programs within the 
broad genre of educational interventions.” Similarly, a 2005 re- 
view of about sixty studies examining cultural competence training 
for healthcare providers concluded that no inferences could be 
drawn about their impact due to the studies’ lack of methodolog- 
ical rigor. 

The evidence from the few valid studies is sobering. One field 
experiment evaluated the effect of an anti-bias intervention in 
first- and second-grade classrooms in the United States where in- 
structors led a series of sessions about sex, race, and body type in 


sixty-one randomly assigned classrooms over four weeks. The 
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experiment, “designed to widen their circles of inclusion to in- 
clude people who are different from themselves,” involved 830 
children. The program had no impact on the children’s biases. 
Those receiving the training were equally unlikely to share or be 
happy about playing with others who were different from them- 
selves after the intervention as those in a control group which had 
not been exposed to the instruction. The program ever so slightly 
improved the pupils’ attitudes toward opposite-sex and opposite- 
race playmates, but it had no impact on their attitudes toward 
weight.’ 

One of the few studies to examine whether diversity training 
programs correlate with a diversifying workforce was aptly enti- 
tled “Best Practices or Best Guesses? Assessing the Efficacy of Cor- 
porate Affirmative Action and Diversity Policies.” In this analysis 
of a national sample of more than 800 mid- to large-size US com- 
panies over three decades, between 1971 and 2002, my colleague 
Frank Dobbin of the Harvard Sociology Department and his co- 
authors found that diversity training has no relationship to the 
diversity of the workforce. In fact, in some cases diversity training 
programs were associated with a small drop in the likelihood that 
certain under-represented groups became managers.’ 

Dobbin and his colleagues are careful to point out that they 
do not have a good understanding of why the diversity of the 
workforce seems to have little to do with whether or not a firm had 
a diversity training program. But a different body of research sug- 
gests a possible answer: the slow, deliberative thinking undertaken 
by our brain’s System 2 requires attention and effort. People who 
are already cognitively busy have been found to make more super- 
ficial judgments and use sexist language. It may be that people were 


too depleted to exert the self-control that is required to create a 


De-Biasing Minds Is Hard 53 


truly inclusive work environment. Pinning the efficacy of your 
diversity training on employees’ over-tasked System 2 thinking 
might in fact open the door to unreflective, intuitive, and often 
biased System 1 thinking. 

Diversity training programs may lead to moral licensing, where 
people respond to having done something good by doing more 
of something bad. A particularly noteworthy experiment illus- 
trating this point was conducted in Taiwan. Some people were 
told that they had been given multivitamins while others were 
told they had received placebos. The people who thought they 
had taken the multivitamin were found to be more likely to 
smoke and less likely to exercise or choose healthy foods. In 
reality, everyone had been given a placebo—but the individuals 
who thought that they were enjoying the health benefits of di- 
etary supplements granted themselves a moral license to smoke 
more cigarettes.” 

Moral licensing, a relatively new field of inquiry, has been dem- 
onstrated in a number of domains, including bias. People who 
were given an opportunity to endorse Barack Obama in the 2008 
US presidential election, for example, were later on more likely 
to discriminate against African Americans. The effect was par- 
ticularly pronounced among people already racially prejudiced, 
raising the unsettling possibility that diversity programs aimed at 
influencing the worst offenders might backfire. A chauvinist man- 
ager who has undergone training might assume a moral license 
when conducting his next interview. Training designed to raise 
awareness about gender and race inequality may end up making 
gender and race more salient and thereby actually highlight dif- 
ferences. Indeed, according to Paluck and Green’s meta-analysis 


mentioned earlier, interventions that discourage people from 
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paying attention to social categories might be particularly effec- 
tive in reducing automatic stereotypes. 

At this point we have to conclude that diversity training either 
does not work or, at the very least, that we do not have enough 
evidence to know whether and under what conditions it does any 
good. Given the billions of dollars being spent globally on diver- 
sity training, this should give many companies pause. In part a re- 
sponse to these disappointing results, a few companies are trying 
different approaches, from implicit bias training to programs aimed 
at micro-inequalities, about whose impact we know even less. 
What, then, should the firm focused on achieving real results do?!” 

Babcock and Loewenstein’s research on the effectiveness of 
various de-biasing techniques provides some hints. They tried two 
more interventions that had been shown to help in other contexts. 
You might have heard of perspective-taking. It is advice that you will 
get in almost any negotiation course. To negotiate more effectively, 
this advice runs, you should try to walk in your counterparts’ 
shoes, take their perspective, understand where they are coming 
from. Although it turned out not to have a big impact in the legal 
dispute case that opened this chapter, perspective-taking has been 
shown to impact people’s beliefs in other contexts. For example, 
“walking in an elderly person’s shoes” by writing an essay from 
their perspective has been shown to reduce stereotypes about the 
elderly. Similarly, perspective-taking interventions that instructed 
people to focus on others’ emotions, say by empathizing with 
African Americans when seeing or reading about discrimination, 
positively influenced attitudes and increased people’s interest in 
interacting with them. 

In another intervention, management students’ bias against 


members of lower castes in India was attenuated through expo- 
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sure to a reality TV show, Satya Meva Jayate, hosted by a famous 
Bollywood movie star. The program documented the atrocities 
and inequalities that lower-caste people often experience. Narrated 
using emotionally charged language, people from the lower castes 
shared their experiences of inhumane treatment, then a former jus- 
tice of the Supreme Court reminded viewers of the values of 
equality, fraternity, liberty, and justice treasured in the Indian 
Constitution, followed by statistics of discrimination and obser- 
vations narrated by a documentary filmmaker. 

This arguably heavy-handed approach building on much psy- 
chological wisdom worked. It decreased implicit bias, measured 
by an IAT, and increased the likelihood that the students felt more 
favorably toward lower castes compared to a control group with 
no exposure to the TV program. Those results were evident three 
months later when another IAT was administered. The study’s au- 
thors infer that “the emotionally charged nature of narrations was 
an important element in reduction in prejudice levels.” Perhaps 
empathic perspective-taking will prove to be an important ele- 
ment of successful diversity training." 

Babcock and Loewenstein gave one last bias-removing tech- 
nique a try. They experimented with what many perceive as the 
most general-purpose de-biasing strategy, namely a consider-the- 
opposite approach. This process encourages participants to play 
devil’s advocate with themselves and come up with arguments for 
why their thinking, including their conclusions, might be wrong. 
In the experiment, plaintiffs and defendants were made aware of 
the self-serving bias and in addition were informed that “it could 
arise from the failure to think about the weaknesses in their own 
case.” Then, they were instructed to write down their case’s 


weaknesses. Thinking of holes in their arguments substantially 
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decreased both parties’ optimism, almost completely closing the 
gap in their assessments. While the impasse rate had been 35 percent 
in the control condition, it now decreased to only 4 percent, im- 
plying that almost all plaintiff-defendant pairings were able to 
settle out of court. 

Calvin Lai and collaborators, running a contest on what in- 
terventions work to reduce racial bias, came to a similar conclu- 
sion. Exposing people to counterstereotypical images was one of 
the winners. A meta-analysis on automatic stereotype reduction 
suggests that a similar technique might also work for gender. 
Instructing individuals to “think counterstereotypical thoughts” 
about the social category or making counterstereotypes salient 
through exposure helped reduce automatic stereotypes, although 
the effect sizes were rather small, and it remained largely unknown 
how long the effects would last. 

Considering the opposite is part of how to think strategies that 
also include logical reasoning and statistical methods. Students 
with coursework in mathematics, economics, and statistics have 
been shown to apply basic principles from those disciplines to their 
decision making, reducing the likelihood that they make decision 
errors. For example, a series of laboratory experiments showed that 
training in statistical reasoning inhibited the formation of inac- 
curate stereotypes.! 

Traditional diversity trainings could be augmented with in- 
struction to help people think more clearly. One approach to 
increasing judgmental accuracy builds on the wisdom of crowds, 
showing that just taking the average of various forecasts outper- 
forms more elaborate predictive procedures. It certainly trumps 
relying on the loudest voice in the room or the result of group 


discussions, which all too often fall prey to groupthink. A per- 
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son’s judgment can be improved even without outside forecasts 
by thinking up several forecasts, picking the average, and in ef- 
fect benefiting from the crowd within." 

Here is how this works: Recall your role as an attorney for a 
client injured in a car accident. When asked to offer your best guess 
of the likelihood of winning in court and the jury award, write 
down your first answer. Then, take a moment to review the evi- 
dence you used for your prediction. In all likelihood, you will re- 
member different pieces of evidence during this second, more 
deliberative round than during the first, more intuitive, round. 
Write down your second estimates. Then, repeat the process one 
more time. Force yourself to consider information you have dis- 
regarded the first two times around. Look in places where you 
might not normally look. Ask questions that you do not normally 
ask—and then, write down your third prediction. Finally, take the 
average of your three guesses and go with it. Research suggests that 
using this crowd-within approach significantly improves judgmental 
accuracy." 

Finally, there is evidence that there is a different sort of wisdom 
that arises from a crowd, wisdom that could prove useful to prob- 
lems broader than employee diversity training. Consider the fol- 
lowing experiment conducted in post-genocide Rwanda, a decade 
after 10 percent of the population, including 75 percent of the Tutsi 
ethnic minority population, had been killed in 1994. Together with 
the nongovernmental organization La Benevolencija, researchers ran 
a randomized field experiment using a yearlong “education enter- 
tainment” radio soap opera that aimed to help people overcome 
prejudice, violence, and trauma and learn to communicate and 
cooperate across ethnic groups. To listen to the radio—the most 


important form of mass media in Rwanda—people gathered in 
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groups in their villages. Thus, the randomization took place at 
the community level with half of the communities assigned to 
the diversity and inclusion program, and the other half to a health 
program. 

To the NGO’s disappointment, the experiment had no impact 
on people’s personal beliefs. At the same time, and somewhat sur- 
prisingly, it did affect behavior: listeners to the pro-diversity pro- 
gram became more empathic and open to intergroup marriage, and 
were more willing to openly dissent, talk about trauma, and coop- 
erate. Individual behavior appeared to be more closely linked to 
changed perceptions of social norms than to personal attitudes. 
While this result was of utmost importance for reconciliation in 
Rwanda, it also has theoretical implications. Maybe the pathway 
to behavioral change is not a change in individual beliefs but instead 
a change in the socially shared definitions of appropriate behavior. 
While both mechanisms are likely relevant, norms are highly 
susceptible to behavioral design—opening the possibility for behav- 
iorally designed training programs to deliver on the promise of 
increased diversity.!° 

Given all the evidence, what should an organization deter- 
mined to run a diversity training program do? I urge companies 
to refocus the training on capacity building and adopt the frame- 
work unfreeze-change-refreeze, based on a method of one of the pio- 
neers of applied social and organizational psychology, Kurt Lewin, 
and borrowed from my friend Max Bazerman, who together with 
Don Moore uses it in their wise book Judgment in Managerial Deci- 
sion Making. You should not just focus on raising awareness, but 
also offer specific tools that help people make better decisions. Fi- 
nally, think of ways you can refreeze the new insights gained and 


the new behaviors learned.” 
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Successful unfreezing happens when people start to question 
their current strategies and become curious about alternatives. Ex- 
periencing one’s own biases, in an IAT for example, can be one 
such wake-up call. You should start your trainings with an un- 
freezing exercise. Unfreezing was also the goal of the previous 
chapter. By experiencing some of our own biases and learning that 
we are all in this together, we become curious about what went 
wrong, why, and what we might be able to do about it. The 
promise of behavioral design is that it offers an unobtrusive, low- 
cost way of changing behavior. 

Once unfrozen, you might want to spend a bit of time on what 
your organization is currently doing—much like this chapter in- 
vites you to review current approaches and learn how to do better 
in the future. Successful training focuses on how to promote 
change, understanding that it is far from easy. Our thinking and 
behaviors are ingrained in personal rituals and organizational prac- 
tices. Leaving the known, the status quo, for the unknown future 
bears risk. To make matters worse, a review of the status quo might 
reveal that past practices were inadequate and possibly counter- 
productive. Such learning can be painful, even threatening. 

But you can do it, using the tools outlined in this book. 
Nobody has to adopt all of my recommendations, but your 
organization can pick and choose—and learn. Research sug- 
gests that people are much more willing to “unlearn” old proce- 
dures and try out new ones when they are involved, so be sure to 
collaborate with colleagues rather than blindly implement any 
new procedures. Coworkers are also more likely to accept unfa- 
vorable outcomes when they think the process was fair. Once 
you have agreed on a new procedure, you need to then test how 


it works.!® 
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I cannot overstate the importance of testing and measuring 
what works and what does not. This chapter purposely is broad, 
learning from de-biasing techniques targeting gender, race, class, 
caste, ethnicity, appearance, and age as well as cognitive biases, in 
very different contexts and countries. These research-based exam- 
ples should serve as inspiration but not relieve you from your own 
testing. We have not yet found the “one-size-fits-all” silver bullet. 
Perhaps we never will. Evan Apfelbaum of MIT and colleagues 
suggest, for example, that the relative share of the underrepresented 
group might inform which diversity approach to choose: “because 
racial minorities are generally represented in far fewer numbers 
than White women, focusing on notions of equality and fairness 
irrespective of social category differences may be particularly 
well-suited to address concerns among racial minorities, whereas 
explicitly recognizing differences and their benefits may be par- 
ticularly well-suited to address concerns among White women.” 

By the end of your program, you should think of ways to re- 
freeze the new insights gained. Reverting to past practices and bad 
habits is tempting. The final component of your program needs 
to focus on the organizational changes necessary to make it easier 
for our biased minds to get things right. 

Consider the procedure that many hotels have introduced: 
guests have to insert a room key card to turn on their room’s lights, 
and the lights turn off automatically when people take the card 
out to leave. The hotels realized that even well-intentioned and 
environmentally conscious guests often forget to turn off the lights. 
Hotels could just assume the costs of this, passing them on to guests 
in higher rates. They could remind guests when they check in to 


always remember to turn off their lights. They could post signs in 
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rooms. Or they could solve the problem through a bit of tech- 
nology and some smart design. 

The refreezing technologies I recommend in this book are 
based on behavioral design practices and procedures. They rely 
neither on traditional compliance mechanisms inducing adoption 
by rewards and coercion nor on people internalizing a new set of 
values. Instead, as the Rwandan experiment demonstrated, these 
designs can change behavior even though participants’ beliefs re- 
main unchanged. Indeed, this is the very promise of behavioral 
design; it can change behavior by changing environments rather 
than mindsets. Soon, and I hope that means by the time you finish 
this book, the question will be not if individuals and organizations 
interested in diversity and inclusion have tried these designs, but 


why they haven’. 


Designing Gender Equality—Change Practices and Procedures 


e Stop simple diversity training focused on raising 
awareness. 

e Follow an unfreeze-change-refreeze framework. 

e Train people in more reasoned judgment strategies, such as 


consider-the-opposite or the crowd-within approach. 


3 


Doing It Yourself Is Risky 


As academic dean at Harvard Kennedy School, one of my key 
responsibilities was faculty hiring and promotion, including nego- 
tiating compensation packages. This left me facing a dilemma. 
Having taught negotiation for many years before assuming this 
new role, I was not concerned with whether I was up to the chal- 
lenge. In fact, my dilemma arose from knowing too much. The 
research of three of my closest professional friends, Linda Bab- 
cock, Hannah Riley Bowles, and Kathleen McGinn, had taught 
me that women were less likely to negotiate than men, and if they 
dared to negotiate, people in my role would like them less. 

We discussed many of these insights at annual conferences 
on gender in negotiation I started to organize at Harvard in 
2004. Hosted by the Women and Public Policy Program, a re- 
search center I direct at the Kennedy School, sometimes in col- 
laboration with the Program on Negotiation at Harvard Law 
School and the Center for Gender in Organizations at the Sim- 
mons School of Management, we were a group of social scientists 
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determined to unpack why it was that women did not seem to 
have a comfortable seat at the negotiating table and what to do 
about it. The conferences culminated in a special issue of Negotia- 
tion Journal summarizing the insights gained, for which Bowles 
and I served as guest editors in 2008. Much of this chapter 
draws on this research and the follow-up investigations it helped 
spur. 

Succinctly, a wealth of research now shows that deans and other 
people around the world responsible for personnel decisions feel 
women who ask for better compensation violate gender norms. It 
isn’t just that we are biased to expect women to be collaborative, 
agreeable, and communal. It is that when we find certain women 
do not abide by these norms, we too often conclude we do not 
want to work with them. Much like the business school students 
who prefer working with Howard than Heidi, people prefer fe- 
male employees who don’t ask. 

In a series of experiments, Bowles, Babcock, and Lei Lai found 
that managers were less likely to want to work with a female 
employee who had asked for a pay increase while a male employee 
asking for the same increase suffered hardly any penalty. Over a 
series of four experiments, their work explained women’s disin- 
clination to negotiate forcefully on their own behalf by showing 
that “asking” penalized women in a way it did not harm men. 
The first experiment focused on hiring and used undergraduate 
students role-playing a bank manager. The participants were 
presented with a request from a job candidate setting out a 
number of demands. The job candidate was given a gender-neutral 
name. The study found that participants negative reactions to 
the demanding job candidate were much larger when referred 
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to throughout as “she” than for the candidate referred to throughout 
as “he.” 

The second experiment focused on participants’ willingness to 
work with a recently hired employee. College-educated adults 
role-playing as senior managers were made aware of what com- 
pensation the new employee had sought at the time of hire. 
Knowing that a job candidate identified as “he” had attempted to 
negotiate for higher compensation (either the “softer” ask, that he 
be paid “at the top of the pay range” and receive a performance 
bonus; or the “harder” ask, that he receive top pay and get a bonus 
equivalent to 25—50 percent of salary) had no significant effect on 
the participants’ willingness to work with him. However, when 
the same candidate was identified as “she,” their willingness to 
work with her was significantly reduced. 

A third experiment replicated the second, but with videotaped 
interviews, with a male actor and a female actor performing as the 
job candidate and striving to approximate each other in manner 
and presentation. Both male and female participants were signifi- 
cantly less willing to work with demanding female candidates. 
Male participants proved willing to work with demanding male 
candidates. And, lastly, female participants were less willing to 
work with any demanding candidate, male or female. This experi- 
ment also revealed that male participants’ disinclination to work 
with demanding female candidates was fully explained by their 
perception of whether or not a female candidate was “nice” or 
“demanding.” 

A fourth experiment asked participants to role-play as candi- 
dates for a job. They were tasked with choosing at the time of their 
“interview” to either ask for a top-range salary and a performance 
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bonus or not. When the participant’s evaluator was identified as 
“he,” women were significantly less likely than men to ask force- 
fully for a higher salary and a bonus, but when the evaluator was 
identified as “she,” men and women were equally unwilling to ask. 
Remarkably, candidates appear to have anticipated evaluators’ 
behavior, expecting that male evaluators would give male candi- 
dates the benefit of the doubt but penalize women while female 
evaluators would penalize male and female candidates equally for 
asking, which in fact was what they did. 

While most evidence on discrimination suggests that the sex 
of the evaluator is less important than the sex of the evaluated, it 
is not uncommon that lower-status individuals, in this case women, 
are more concerned about violating norms when confronted with 
a high-status individual, in this case a male evaluator. Bowles 
and Michele Gelfand found that the pattern applies not only to 
sex but also to race. In fact, it also applied when individuals were 
randomly assigned high or low status in an experiment. When 
judging deviant behavior, high-status evaluators were more inclined 
to punish low-status individuals than another fellow high-status 
individual. Low-status evaluators, on the other hand, were more 
inclined to treat high- and low-status individuals equally.’ 

Negotiating matters. In fact, it matters profoundly. People who 
are less likely to ask about better compensation are not just worse 
off than those who are willing to do so, but considerably so. Among 
master’s degree students at Carnegie Mellon, almost all of the 
women, namely 93 percent, refrained from negotiating their em- 
ployer’s initial salary offer. Of the men, fewer than half, namely 
43 percent, accepted the first offer, Babcock and Sara Laschever 
report in their path-breaking book Women Don’t Ask. That alone is 
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astounding. But there is more. The researchers report that men’s 
starting salaries were $4,000, or almost 8 percent higher than 
women’s. The rippling consequences hardly stop there. Willing- 
ness to negotiate also affects career advancement, as a former stu- 
dent of mine, Fiona Greig, found in a US investment bank. Women 
bankers again proved less likely to negotiate than their male peers. 
And those more willing to negotiate advanced more quickly in the 
firm than their less assertive colleagues. There is more. Greig dem- 
onstrated that a candidate’s assertiveness had nothing to do with 
his or her performance, meaning the more assertive employee, but 
not necessary the best performer, was being promoted.’ 

Sadly, even when women negotiate, they tend to ask for less. 
Social science graduates in Sweden indicated on a survey whether 
their potential employer requested them to make an explicit wage 
bid, and if so how much they asked for. The survey also asked for 
the final wage offer. Otherwise equally qualified women applying 
for similar jobs as their male counterparts asked for and ended up 
with lower wages. Not only did employers counter women’s 
already lower demands with more stingy counter-offers, they re- 
sponded less positively when women tried to self-promote. 

Women, it turns out, cannot even exercise the same strategies 
for advancement that men benefit from. I was also aware of the 
empirical research conducted in the UK academic labor market 
for economists. Not only did female economists not optimize their 
negotiation positions as frequently as men by seeking, for example, 
outside offers, but when they did, their improved positions did 
not translate into as many compensation goodies. To start with, 
holding productivity constant, female economists received fewer 
outside offers than their male colleagues did. And in contrast to 


the men, the outside offers women received hardly mattered for 
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the counter-offers their current employers made. Sitting behind 
my new office door, I was all too aware of the fact that female aca- 
demics cannot be blamed for seeking fewer outside offers than their 
male colleagues. Such offers were literally worth less to them.* 

Thus, as academic dean I was left with a dilemma: if I repre- 
sented the institution’s interests and negotiated zealously on its be- 
half, I could take advantage of gender biases in negotiation. I 
would benefit from the fact that my female colleagues might not 
want to ask for perks and raises because of a reasonable fear of so- 
cial backlash. However, if I tried to walk too much in their shoes, 
anticipate or even try to compensate for the biases I knew all too 
well skewed negotiations, I would not meet the expectations that 
came with the job. I had a limited budget available and had to use 
the money as wisely as possible. 

Using it wisely meant, of course, attracting and retaining the 
most qualified individuals. Increasingly, managers are aware of 
the negotiation dilemma women face. And they do not want to 
lose disenchanted female employees who, after accepting a job 
and compensation package, find out that they were given a worse 
deal. Jennifer Lawrence, the Academy Award—winning actor, 
wrote about her dismay in October 2015 after a hack had revealed 
how much less she was paid than her male costars: “I would be 
lying if I didn’t say there was an element of wanting to be liked that 
influenced my decision to close the deal without a real fight. I 
didn’t want to seem ‘difficult’ or ‘spoiled.’ At the time, that seemed 
like a fine idea, until I saw the payroll on the Internet and realized 
every man I was working with definitely didn’t worry about 
being ‘difficult’ or ‘spoiled. ”’ 

Neither disregarding nor exploiting gender norms is wise, let 
alone fair. What then should managers do? Before telling you what 
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strategy I followed, it is important for you to be broadly familiar 
with the evidence. One important research discovery by Bowles, 
Babcock, and McGinn is that transparency about negotiability is 
crucial. A survey of a set of MBA students in their first job re- 
vealed an average gender gap in pay of $6,000, controlled for ap- 
propriate variables. However, in fields with low ambiguity where 
applicants had good information about what to negotiate for, 
the gender gap almost vanished. In contrast, in fields with high 
ambiguity, men made about $10,000 more than women on 
average. Various experiments further corroborated this pattern. 
Gender effects appeared primarily when the situational cues re- 
garding expected behavior were ambiguous. When, for example, 
cues about a position’s typical wage range were clear, women were 
as good at negotiating as men. 

In a field experiment, Andreas Leibbrandt and John List ex- 
amined this further. They placed two different job advertisements 
for administrative assistant positions in nine major US cities. One 
ad made it clear that wages were negotiable; the other was am- 
biguous. An interesting gender pattern emerged among the 2,500 
job seekers who responded to the postings: men were more likely 
to apply to jobs when it was left ambiguous if wages were nego- 
tiable than when the expectation of negotiation was made explicit. 
Did they know Bowles and collaborators’ laboratory evidence doc- 
umenting that ambiguity worked fine for men but not for women? 
Unlikely—but men clearly seem to be more comfortable with 
ambiguity and appear to expect to do better in a situation where 
negotiation is not expected. Indeed, the male job seekers were 
more likely to negotiate than their female counterparts when 
the ad did not indicate that wages were negotiable. The opposite 


was true for women. They were more willing to negotiate when 


Doing It Yourself Is Risky 69 


the ambiguity was removed and the ad “invited” them to do so. 
Given the negotiation dilemma women face, external legitimiza- 
tion helps them overcome that hurdle.° 

Consider the following anecdote. On December 19, 2014, 
President Barack Obama gave a press conference just before de- 
parting with his family for the holidays in Hawaii. It was business 
as usual but for one small difference, which instantly made the 
news: he called on female reporters only. What appeared to have 
been a deliberate move on the president’s part was particularly 
remarkable given that the White House press corps has tradition- 
ally been dominated by men. Helen Thomas, the first woman to 
cover the president of the United States, was only admitted to their 
ranks in 1960. 

Whether or not Obama was familiar with the relevant re- 
search, members of his administration certainly were. Victoria 
Budson of the Women and Public Policy Program had briefed the 
White House Council on Women and Girls on several occasions. 
One point she made more than once was that ever-increasing data 
show that women shy away from negotiation, do not speak up as 
often as their male colleagues do, and are less likely to be called 
upon. 

Certainly, Google was aware of the evidence, as Laszlo Bock, 
head of People Operations at the company, describes in his fasci- 
nating book Work Rules. Analyzing their data, they had found a 
gender gap: women were less likely to nominate themselves for a 
promotion than their male counterparts. Survey evidence collected 
by Francesca Gino of Harvard Business School and her collabora- 
tors suggests that women find promotions less desirable and are 
less likely to pursue them because they expect stronger negative 


outcomes than men from promotion to a higher-level position (for 
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example, stress or anxiety, difficult trade-offs or sacrifice, time con- 
straints, burden of responsibility, or conflict with other life goals). 

Bock and others at Google decided to try and do something 
about it. Accordingly, they sent out emails such as the one repro- 
duced below to all technical Google employees: “I wanted to up- 
date everyone on our efforts to encourage women to self-nominate 
for promotion. This is an important issue, and something I feel 
passionately about. Any Googler who is ready for promotion should 
feel encouraged to self-nominate and managers play an important 
role in ensuring that they feel empowered to do so... We know 
that small biases—about ourselves and others—add up over time 
and overcoming them takes a conscious effort.””” 

Even after being given an explicit invitation to do so, women 
might still feel that self-promotion is too risky. It is important to 
realize this is not a matter of timidity, but of backlash. Bowles and 
Babcock discussed the social cost of asking with Sheryl Sandberg 
when they joined her on Katie Couric’s television program to dis- 
cuss Sandberg’s illuminating book, Lean In. The research, they 
explained, was clear. We use different measuring rods to evaluate 
men’s and women’s behavior, or as Babcock said on the show, 
quoting Laura Liswood of the Council of Women World Leaders: 
“Women when they display anger come off as too aggressive. You 
know there’s an old saying: men are too aggressive when they 
bomb countries, women are too aggressive when they put you on 
hold on the phone.”® 

Bowles and Babcock’s research suggests that to lean in safely, 
women can invoke someone else, maybe a supervisor, to legiti- 
mize their decision to negotiate. The specific protocol they tested 
in their research had the negotiator say the following: “My team 


leader during the training program told me that I should talk with 
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you about my compensation. It was not clear to us whether this 
salary offer represents the top of the pay range.” It worked, making 
women’s demands more acceptable. In addition, women might 
want to use more inclusive language and benefit from what the 
researchers refer to as “relational accounts.” Women can improve 
their negotiation outcomes and mitigate the potential social con- 
sequences of asking by embedding their requests in a larger or- 
ganizational context. Showing concern for the organization can 
legitimize requests made on their own behalf by arguing that in 
displaying their negotiation skills they are showing an asset that 
will benefit the organization.’ 

Such subtle changes can reap rewards. Sheryl Sandberg, for 
example, finds replacing “I” with “we” particularly powerful. We- 
language represents communal values and further embeds the indi- 
vidual’s ask in the larger organizational context. Both attempts at 
legitimizing the ask can work for women, but neither the two 
researchers nor Sandberg were particularly excited about the 
message this sends. I can sympathize. Knowing in detail that 
biases are prevalent, skew the playing field, and require more of 
women seeking commensurate compensation than of men is 
dispiriting. I am always uneasy when I share these findings with 
my female students, but I also remind myself that just telling them 
to be patient and wait until we have fixed the system is an even 
worse answer.” 

Without explicit invitations or external legitimization, it turns 
out even women in leadership positions speak up less than their 
male counterparts, and for good reason. In the Swedish parliament, 
the Riksdag, female members of parliament give significantly 
fewer speeches than their male colleagues, despite the fact that over 


40 percent of the MPs are women. This is well above the average 
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in Europe, where approximately 25 percent of parliamentarians 
are women, and in the United States, where below 20 percent of 
legislators are women. In the US Senate, power—measured by 
tenure, track record of legislation being passed, and leadership 
positions—is an excellent predictor of a senator’s likelihood of 
speaking up on the floor—but only for men. For female senators, 
power does not translate into more speaking time. 

Victoria Brescoll of Yale University examined why this might 
be the case. She asked a group of professional men and women to 
evaluate the competence of chief executives. The executives, male 
and female, differed in how much they spoke. Male executives 
who spoke up were rewarded with higher ratings of competence 
compared to their quieter peers. In contrast, both male and fe- 
male evaluators punished women for speaking up and gave the 
female executives who spoke more than their peers substantially 
lower ratings. Women do not simply prefer saying less. Rather, in 
response to their environments, they understand that the “male 
way” might not work for them and so behave differently. This en- 
capsulates the gender inequality bind that women find themselves 
in. The “male way” is the accepted way of advancement, but it not 
only doesn’t work for women, women who adopt it are penalized 
for doing so. Women cannot break the ice on their own." 

Being called on by the president or a supervisor helps, but much 
more is required to level the playing field and allow organizations 
to benefit from everyone’s contributions. In fact, the risks and re- 
wards are even higher than just ensuring equal opportunities for 
everyone to contribute. Often, organizations are failing to hear 
from their best people. A former student of mine, Katie Baldiga 
Coffman, now at Ohio State, found in an experiment that it is par- 


ticularly the very knowledgeable women who under-contribute. 
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But everyone, men and women, contributed too little in areas in 
which their gender was stereotypically considered to grant them 
less expertise. In Coffman’s study, male and female participants 
were randomly put into groups of two. Groups were then pre- 
sented questions and multiple choice answers in a number of 
areas, including gender stereotypical areas such as “Arts and Liter- 
ature” and “Sports and Games.” Participants had to decide whether 
or not to submit their answer as the group’s answer, with the par- 
ticipant most willing to do so—measured by how quickly he or 
she decided—being chosen. Most people expect women to know 
more about, say, the arts and men more about, say, sports—which 
is correct. But the quiz-takers overestimated how much more the 
other gender knew and thus missed opportunities to contribute. 
This was even more remarkable because in Coffman’s experiments 
people remained anonymous and consequently did not have to fear 
backlash. Everyone would have benefited if men and women, and 
in particular the top women, had offered their opinions more 
often. But even in the safety of anonymity, even the most informed 
women held back. They fell prey to self-stereotyping.’” 

So, on becoming dean, I decided to do a few things differently: 
first, I watched who asked. It was very tempting to infer from 
the act of asking that the person wanted something very badly and 
thus was also the most motivated to do the best job. Familiar with 
evidence refuting that assumption, I worked to keep it in check. 
Second, I kept track meticulously of what people asked for. It 
is very difficult to avoid being affected by the demands put on 
the table, and I did not want to respond to just these requests. 
Negotiation scholars call this anchoring. If men ask for more than 
women, then the typical negotiation dance in which the parties 


move closer to decrease the gap between demands and offers yields 
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a gender gap in pay. Thus, rather than focusing on my counter- 
part’s demands, I anchored myself at the going market rate and 
internal comparators. Third, I invited my counterparts to ask for 
what they wanted and needed—obviously, without promising 
that I could deliver. I tried to be as transparent as I possibly could 
about what was negotiable. Finally, I monitored the Kennedy 
School’s compensation packages, promotion rates, pay raises and 
other relevant data by gender (and other characteristics) to make 
sure we did not inadvertently discriminate against a particular cat- 
egory of people. 

I should now admit that when I was offered a job as assistant 
professor at Harvard in 1998, I did not ask—at all. I later learned 
that I could have and promised myself to handle things differently 
if ever given the chance to negotiate my salary and benefits again. 
The opportunity arose when I was granted tenure and needed to 
negotiate terms with the dean. Becoming a full professor with 
tenure is a big step for an academic, and it comes with sizable 
financial implications. So, being much more knowledgeable of 
the research on gender inequality and some of its causes in 2006 
than I was in 1998, I did my homework, was lucky to have re- 
ceived an outside offer from a prestigious competitor, and was gen- 
erally well prepared for my negotiation with the dean. Alas, I was 
also keenly aware of the evidence on social backlash. I wanted a 
nice salary, but I also wanted a good relationship. The strategy I 
chose may or may not be helpful to others, and I have never eval- 
uated it systematically, but it felt right to me at the time: I shared 
with him what we have discussed in this chapter—and then asked. 

He and I still have a good relationship. What is more, when I 
assumed the position of academic dean I learned about everyone’s 


salaries. Consequently, I can attest that I was treated fairly in that 
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negotiation. Maybe it was just him, maybe this only works in an 
academic environment where people care deeply about research- 
based evidence, or maybe it works in your circumstances as 
well. I have heard numerous stories of female executives who, 
after having participated in one of the Kennedy School’s execu- 
tive training programs, went back to their organizations to rene- 
gotiate their pay. They applied what they had learned, but also 
explained the broader context and the particular challenges women 
face. It did not always lead to a pay raise, but in many cases it 
opened the door to a meaningful discussion. 

One additional and important piece of evidence I always leave 
my students with is that in many negotiations we negotiate on be- 
half of others. For me, this is one of the most empowering re- 
search findings. The negotiation dilemma completely disappears 
when women negotiate for someone else. This has no influence 
on men, but it gives women a great boost. Think of attorneys de- 
fending their clients, doctors advocating for their patients, profes- 
sors sponsoring their students. There is no gender role conflict in 
these situations as women are expected to care about the people they 
represent and fight hard to advance their interests. A meta-analysis 
including more than 10,000 subjects (students as well as executives) 
confirmed the importance of representation and transparency." 

Indeed, many women (and men) negotiate their pay not just 
for themselves but on behalf of their families. An early study 
showed that this gives negotiators a justification to keep more for 
“their group” than they would have claimed for themselves, in 
particular, when the outcomes were public. While empowering 
on the one hand, the impact of household dynamics on external 
negotiation is a topic that social scientists have only recently 


started to unpack. Many internal negotiations take place within 
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the household—on how time is allocated, income generated, care 
given, money spent, children raised, and so on. Bowles and Mc- 
Ginn offer a two-level negotiation as framework for analysis. In 
it, the members of a household engage in both intra-household 
negotiations with their spouses or partners, and external negoti- 
ations with organizations. The two levels affect each other. For 
example, a key determinant of bargaining power in the household 
is earning power outside of the home. Thus, if gendered norms 
hold women back in their salary negotiations, they may also af- 
fect their status within the household. 

But of course the reverse can also be true: gendered norms 
within the household can influence women’s outside opportuni- 
ties. For example, we know little about how to change the expec- 
tation that husbands should earn more than their wives. Research 
suggests that violations of this norm have consequences. Based on 
US Census data, we can see that marriage rates decline as women 
become more likely to earn more than men. Similar trends can be 
found in other countries. For example, in Latin American coun- 
tries marriage rates have also decreased as the gender gap in edu- 
cation has been reversing, producing more women who are better 
educated than men. What is more, Marianne Bertrand of the 
University of Chicago and colleagues find that as the likelihood 
increases that a wife’s potential income will eclipse her husband’s, 
the wife is more likely to stay at home. And when she does keep 
working and makes more money, she compensates by also doing 
more at home. Clearly running against the economic logic of 
division of labor, these findings strongly suggest that we need 


more research to better understand how negotiations within the 
household work." 
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Nava Ashraf is among the few to experimentally examine the 
impact of bargaining power on how couples spend their money, 
conducting field research in the Philippines. She varied how much 
information spouses had on each other’s financial choices. Knowl- 
edge means power, of course, as the more one spouse knows about 
the other’s use of money, the more she or he is able to interfere and 
question decisions. It turns out that the information was particu- 
larly empowering to those in control of household savings, mainly 
women. When unobserved, husbands kept money for themselves, 
but when required to share information, they were more likely to 
put it into their wife’s account." 

As we are rarely able to observe intra-household negotia- 
tions directly, researchers have relied on outcomes to infer how 
efficient the negotiation has been. Consider the example of 
farmers in Burkina Faso. It is not uncommon in Africa for men 
and women to own their own land, even after marriage. Spouses 
also pool their money to buy tools and other useful assets, such as 
seeds and fertilizer, and together work on each other’s plot, sharing 
the collective spoils of their labor. But, as in the Philippines, 
household members in Burkina Faso had different preferences 
about how the pooled income was used—and this kept them 
from maximizing the size of the pie before haggling over who 
gets which slice. 

It turns out that many more resources were used on the men’s 
plots, making them much more productive than the women’s plots 
over time. The main reason was the additional fertilizer applied 
(controlling for all other relevant variables). The irony, of course, 
is that the benefit generated by adding fertilizer to a plot of land 
declines steeply the more that plot is used. Fertilizer can help a 
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plot of over-farmed field, but not as much as it can help an under- 
tilled field. Thus, men’s plots benefited under-proportionally to what 
women’s plots would have yielded, for it would have been those 
plots where the fertilizer could have made a huge difference. The 
households would have been significantly better off if the fertilizer 
had been allocated more wisely. Put differently, money quite liter- 
ally was left on the table just to increase men’s relative bargaining 
power in the household. 

The implications of such inefficiencies in intra-household 
negotiations are important. In her insightful survey of the relation- 
ship between economic development and women’s empowerment, 
Esther Duflo of MIT concludes: “This means that we cannot rely 
on the family to correct imbalances in society.” Households are 
affected by the very same gender norms, leading not only to 
inequities but to everyone being worse off. If women do not 
contribute their knowledge in meetings and do not have access to 
the resources they need, we all are worse off for it.'® 

In an ambitious project, Ashraf and McGinn are studying 
whether teaching girls how to negotiate might be able to address 
some of this. Specifically, they examine the impact of teaching 
Zambian girls how to negotiate with their parents, guardians, and 
other adults about their future. The stakes are high. The girls they 
are teaching frequently must negotiate for the right to remain in 
school, marry later, or say no to “sugar daddies” offering to pay 
for school tuition and supplies in return for sex. Their random- 
ized controlled trial is still ongoing. It aims to measure the impact 
of negotiation training on education and health outcomes in the 
lives of about 3,000 girls in eighth grade in Lusaka public schools. 
While early results are promising, they also show that these young 
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women are up against a host of formal and informal constraints 
and quite aptly respond to them.” 

However skilled you might be, overcoming biased environ- 
ments on your own—in the workplace and at home, in Zambia 
or the United States—is hard and risky. In the spring of 2014, the 
failed attempt of an academic at negotiating a tenure-track job 
offer made the news in the United States. After she had made a 
few demands, the college withdrew the offer. A New Yorker article 
commenting on the case was aptly entitled “Lean Out: The Dan- 
gers for Women Who Negotiate.” 

In order for everyone to be able to “lean in,” we need to make it 
safer. And we can—by changing the constraints people face. Be- 
havioral design focuses on changing these constraints. Clearly, this 
is a tall order in a complex environment such as the one the Zam- 
bian eighth-grade girls face. But we have had significant successes in 
the past. For example, women’s willingness to invest in their educa- 
tion has been shown to be a response to changing constraints. When 
the Pill became available, more women started to pursue profes- 
sional degrees and work more often outside the home. The intro- 
duction of infant formula had a similar effect on married women of 
child-bearing age. And better household technologies, from dish- 
washers to microwaves, have helped all women participate more." 

These technological innovations were neither low-hanging 
fruit nor behavioral interventions. But they move beyond “self 
help” approaches, as Anne-Marie Slaughter refers to women’s 
attempts at moving toward gender equality themselves in her il- 
luminating book Unfinished Business, and illustrate the power of 
changing people’s opportunity sets. Given an opportunity, many 
and perhaps most people will take it. Sometimes a nudge rather 
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than a shove is all it takes. For example, while conditional cash 
transfer programs, typically paid out to households that commit 
to sending their children to school, have been successful in in- 
creasing children’s school attendance, they are expensive and 
complicated to administer. Recent evidence from Morocco shows 
that similar and even bigger effects can be achieved with a much 
smaller investment. What the researchers referred to as a “labeled 
cash transfer” was given to fathers not on the condition that 
their children attend school, but merely as part of an education 
support program. The sum was modest and could not be consid- 
ered a meaningful incentive on its own. However, by explicitly 
tying the gift to the goal of education, the government was able to 
send a signal about the importance of education, make this salient to 
parents, and influence behavior.”° 

Redesigning the context in which women and men negotiate 
works in the same way. The labeled cash transfer equivalents to 
helping women negotiate more effectively are transparency, re- 
lational accounts, and negotiating on behalf of someone else. While 
the latter two are helpful strategies that women should adopt, 
transparency is the design feature that countries and organizations 
should implement immediately. When given the opportunity to 
negotiate in a less ambiguous environment, women (and men) will 
take it. 

The Obama administration has picked up on this evidence. On 
April 8, 2014, President Obama signed Executive Order 13665, 
which prohibits federal contractors from retaliating against workers 
for discussing their salaries with one another. The order states that 
when employees are prohibited from discussing their compensa- 
tion with fellow workers, pay discrimination by sex and race is 


more likely to persist. Prohibiting employees from discussing their 
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pay limits the information available to applicants and future 
employees, which creates an opaque process that inhibits salary 
negotiation. This order allows employees to talk about their 
compensation without fear of being fired, which may promote 
transparent hiring practices by employers and more equal pay for 
employees. Valerie Jarrett, then senior advisor to the president, 
commented on the executive order, saying: “With this new trans- 
parency, we can have an honest conversation. So many times 
women have no idea that they’re being discriminated against. They 
have no idea what their counterparts are making.””! 

Increasing transparency is low-hanging fruit. It is an easy and 
practical de-biasing design. Failing to do it is not just ethically du- 
bious, it is very much like leaving the most fertile plot you own 
undertilled. 


Designing Gender Equality—Create Equal Opportunities 
for Negotiation 


e Invite people to speak up or initiate negotiations. 
e Increase transparency about what is negotiable. 
e Have people negotiate on behalf of others. 


4 


Getting Help Only Takes You So Far 


In German, they are called “Frauenförderungsprogramme,” sup- 
port programs for women. They have gained in popularity in Eu- 
rope’s largest economy. According to a law passed in March 2015, 
Germany’s largest 100 DAX companies have to fill 30 percent 
of their supervisory board seats with women starting in 2016. 
Another 3,500 companies had to submit plans on how they 
intended to increase the share of women in senior positions in 
September 2015. But are these support programs, including lead- 
ership training, mentoring, and networks, effective? 

With biases being so deeply ingrained in our unconscious 
minds, trying to de-bias people via diversity training has proven 
to be challenging. Asking women to do it themselves has been dif- 
ficult and often risky, for women leaning in can experience back- 
lash. If women cannot quite do it on their own, can they be much 
more successful with help? 

To start with, not enough women have been getting help. Ac- 
cording to the 1995 US Glass Ceiling Commission, lack of man- 
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agement training has been a key barrier to women’s advancement 
in corporate America for decades. Indeed, research suggests that 
firms tended to statistically discriminate against women by of- 
fering them less access to development programs than their male 
colleagues. By now, many professional schools, including Harvard 
Kennedy and Business Schools, Stanford Business School, London 
Business School, INSEAD, and IMD, to name but a few, offer 
executive education programs focused on women’s leadership. The 
programs vary in scope. Some use a generic leadership curriculum. 
Others aim to teach women the skills necessary to succeed in a 
“man’s world.” Yet others focus on the organizational context that 
makes it harder for women than for men to develop as leaders. 
Many combine these and different features. 

That we have begun to level the playing field in terms of ac- 
cess to development opportunities is good news. But, of course, it 
tells us little about the impact of such access. I am not aware of 
any rigorous evaluation of leadership development programs for 
women. Recently, some critical voices were raised. A 2014 study 
reporting evidence based on interviews with personnel managers 
in Germany, Austria, and Switzerland went as far as to suggest that 
these programs might even be counterproductive: “Women are 
stuck in development and coaching programs while the men get 
the jobs.” In the same year, a McKinsey report, “Why Leadership- 
Development Programs Fail” was equally skeptical and offered some 
suggestions for improvement: tailor programs to focus on the core 
competencies relevant for the business, couple them with on- 
the-job projects, and uncover “below the surface thoughts, feel- 
ings, assumptions, and beliefs” that slow or prevent behavioral 
change. The report concludes by lamenting the absence of rigorous 
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evaluations of existing programs and a call to action, a point echoed 
by a 2015 Harvard Business Review article entitled “Evaluate Your 
Leadership Development Program.”! 

Shockingly, despite the $14 billion that, according to Mc- 
Kinsey, US companies spend on leadership development annually, 
the impact of leadership training, let alone leadership training 
specifically targeted at women, is largely unknown. A study that 
does allow us to make causal inferences about the relevance of 
leadership training is based on a hybrid program that combined 
leadership training with mentoring. It stems from an initiative 
sponsored by the National Science Foundation and the American 
Economic Association, and was inspired by Frank Dobbin and col- 
leagues’ findings that mentoring programs were associated with 
an increase in diversity in management in the over 800 companies 
they examined. Indeed, they reported that mentoring programs 
went hand in hand with the successful increase in diversity among 
all seven traditionally discriminated-against groups, namely, white 
women as well as African American, Hispanic, and Asian American 
men and women. While there is more research to be done, cou- 
pling leadership training with mentoring offers great promise.* 

In 2004, the Committee on the Status of Women in the Eco- 
nomics Profession decided to offer a special workshop to female 
assistant professors. The program would turn into a much-loved 
and highly valued experience. Former doctoral students of mine 
who have had the good fortune of participating describe it as 
“wonderful” and “absolutely invaluable.” In an exit survey, most 
participants gave the workshop the highest possible mark. But we 
do not only have to trust my students’ word or even just judge 
from their exit comments whether and how the participants ben- 


efited from the program. The founders, four leading professors of 
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economics, Francine Blau, Rachel Croson, Janet Currie, and 
Donna Ginther, designed the program so that it would allow us 
to draw conclusions based on hard evidence. They designed a ran- 
domized controlled trial examining its impact. 

Like most other professions, the field of economics suffers from 
a “leaky pipeline.” Many more women get PhDs in economics 
than there are tenured female economics professors. As in any pro- 
fession, the causes of this are many, but one particularly striking 
feature in academic economics departments is the gender gap in 
promotion rates. Depending on the specific study, time frame 
and controls included, researchers have found a promotion gap 
of between 14 and 21 percentage points. Even granting for sys- 
temic gender inequality, this finding is notable. Women are less 
likely to be promoted to tenure in economics than in political 
science, statistics, the life sciences, physics, and engineering. To the 
discipline’s credit, the committee decided not only to start a men- 
toring program to assist female junior faculty in overcoming the 
tenure hurdle, but to do so in a way that allowed them to judge 
its efficacy. 

Since its inception in 2004, the program has been repeated 
every two years. After the first three iterations, the creators of the 
program took stock. Of the applicants to the 2004, 2006, and 2008 
workshops, a bit more than half had been randomly assigned to 
receive the training. The others were relegated to a control group 
that did not receive the additional support. All applicants were 
aware of this possibility and accepted it as necessary to producing 
valid evidence. 

The program brought participants together in a workshop that 
lasted two days. They were matched with senior faculty mentors 


in small groups based on research interests. The sessions included 
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feedback on individuals’ work as well as briefings on specific skills 
needed to succeed in academia, such as research and publishing, 
teaching, grant-writing, professional networking and exposure, 
the tenure process, and work-life balance. While some of the skills 
were directly related to the work of an academic, such as research 
and teaching, other skills the program focused on, such as profes- 
sional exposure and networking, are nearly universal. Performance 
across professions is not based just on an individual’s results, but 
includes how individuals and results are “sold.” In their interim 
review, the four economists that designed the study compared the 
performance of those who received mentoring and those who 
did not. The evidence made clear that the program had worked: 
the young academics who had participated in the program had 
more publications and more success with their grant applications 
than the unlucky applicants not part of the treatment group.’ 

A meta-analysis on mentoring at work (not specifically focused 
on gender) based on forty-three studies also finds positive conse- 
quences measurable in compensation, promotion, and career sat- 
isfaction. However, the effect sizes were modest. Evidence from 
other domains comes to similar conclusions. A meta-analysis on 
mentoring of youth covering more than seventy studies, for ex- 
ample, found that overall mentoring was related to young people’s 
developmental outcomes, but again, effect sizes were rather small 
and long-term benefits unclear.* 

Catalyst, a leading NGO researching and advocating for the 
expansion of women’s opportunities in business, as well as Sylvia 
Ann Hewlett of the Center for Talent Innovation and a number 
of other leaders in the field, have argued that sponsorship might 
be even more effective than mentorship. In 2008 and 2010 sur- 
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veys of more than 4,000 employees, all 1996 to 2007 graduates of 
top MBA programs from around the world, Catalyst found that 
women were slightly more likely than men to have mentors. But 
they had different sorts of mentors than did men—less senior, with 
less organizational clout. In addition, the female employees’ men- 
tors tended to coach and advise them while the men’s mentors took 
on a more active role and became advocates for their protégés, ac- 
tively helping them advance their careers. They went beyond 
mentoring and into active sponsorship. 

Sponsors make sure their protégés get visibility and are con- 
sidered for promising opportunities. They negotiate on their be- 
half for interesting job assignments, promotions, and pay increases. 
Sponsors either vicariously or directly benefit from their protégés’ 
successes. Some firms even hold sponsors accountable for how 
well their protégés do and reflect this in their pay. Why mentors 
of women did not become sponsors isn’t clear. It is possible that 
demographics played the deciding role. The pool of available 
mentors for men and women was predominantly made up of men. 
Implicit biases may also have played a role, with female mentees 
being less aggressive in seeking the most from mentors, and male 
mentors unconsciously penalizing more assertive women. But 
what was clear was that by 2010, the men in the sample had received 
15 percent more promotions than their female colleagues.° 

Building on Catalyst’s survey evidence, Katie Baldiga Coffman, 
together with her mother, Nancy Baldiga, constructed a lab ex- 
periment to take a closer look at some specific aspects of mentoring 
and sponsoring. Specifically, they were interested in two questions: 
How does the well-established gender difference in self-confidence 
affect sponsoring? Does sponsoring work better for men than for 
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women because men are more self-confident to start with? And 
do instrumental approaches, such as rewarding sponsors, work for 
women as well as they do for men? 

The team designed their study to examine these two aspects 
of sponsorship. Building on the work suggesting that women are 
generally less willing to compete than men, at least in the United 
States and many European countries, the two authors chose com- 
petitiveness as their outcome variable. Would being chosen by 
someone who cares boost women’s self-confidence and make them 
more willing to compete? Presumably, by choosing a specific pro- 
tégé, sponsors send a signal to that person that they believe in his 
or her talent. This signal might be particularly credible, however, 
when the protégé knows that the sponsor has a direct interest in 
his or her success. In the lab, the latter can be easily accomplished 
by tying the sponsor’s compensation to how well the protégé is 
doing. 

In both the Catalyst study and the Baldiga and Baldiga Coffman 
study, it turns out that being chosen by a sponsor works better for 
men than for women. The signal of being chosen increases most 
men’s willingness to compete and their earnings, but it does 
nothing for most women—with one interesting exception: spon- 
soring helps the most talented women. Exactly why only talented 
women benefit isn’t entirely clear. Maybe a certain level of self- 
confidence is required to profit from a sponsor’s attention? With 
men generally being more self-confident than women, a topic we 
will discuss in detail later, they might find it easier to benefit from 
a sponsor’s endorsement than women—with an exception for those 
women at the top who have enough data available to confirm that 
they “deserve” to be chosen.° 
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Firms with sponsorship programs generally report that they 
work. While hardly hard evidence, it is worth noting that the 
firms’ findings do not run counter to the lab’s. They cite that more 
of the protégés were promoted to more senior roles than a com- 
parable group without sponsors, and turnover rates for protégés 
decreased. Obviously, the firms’ evidence is not based on a random 
sample. They may have done a good job selecting the most tal- 
ented protégés as well as offering them the right kind of sponsor- 
ship. And evidence from firms tells us little about potential gender 
differences in a program’s effectiveness. 

As someone who has assisted such programs, I have been struck 
by one particular comment that the female protégées typically 
make: to a degree far greater than the men, the women appreciate 
not only the knowledge gained and the support received from the 
leaders of their firm, but also the connections that they have been 
able to make with peers. Indeed, much has been written about the 
importance of networks. For example, it has been argued that 
same-sex networks are particularly important for women due to 
the scarcity of senior female role models. We tend to relate to 
and learn from similar others. When looking for sponsors or 
mentors, we typically try to find someone with the same demo- 
graphic characteristics. But, of course, the demographic mix in an 
organization complicates and can even thwart such efforts. 

In joint research with Farzad Saidi, I show that this puts women 
at an informational disadvantage. Our argument and experimental 
evidence is based on a simple statistical insight: the smaller the 
sample you have available, the noisier the information you receive. 
Consider the new hire trying to figure out what is deemed ap- 


propriate clothing for a client meeting. If you can only learn from 
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the behavior of, say, two comparable others, and one of them wears 
a business suit and the other one is dressed casually, it is hard to 
know what the appropriate way of dressing is. If, on the other 
hand, there are fifty comparable others at your disposal, you will 
be able to make more precise inferences about the dress code. We 
show that such an informational disadvantage can affect perfor- 
mance, leaving members of smaller identity groups, such as women, 
worse off than members of larger identity groups, such as white 
men. But the logic does not only apply to gender. For example, 
religion-based, country-of-origin, or same-language networks 
have been shown to facilitate business relationships, job seeking, 
and even participation in welfare and social programs.’ 

In a network analytic study of men’s and women’s networks in 
an advertising firm, Herminia Ibarra of INSEAD shows that or- 
ganizational demography, the relative fractions of particular demo- 
graphic groups represented in an organization, not only influences 
the quantity but also the quality of relationships people have. Men 
tend to have both male mentors as well as male friends; women 
also have female friends. However, possibly by necessity due to the 
dearth of senior women or by choice in order not to use their 
same-sex friendships for strategic purposes, women tend to gain 
access to career and advancement opportunities through networks 
with men. 

Building on these insights, Ibarra has developed a Network As- 
sessment Exercise that allows people to better understand the net- 
works they are part of and whether they help them identify career 
Opportunities and advance professionally. The exercise is often 
used in leadership development programs where participants can 
assess how their networks compare to the breadth and depth of 


other people’s networks. There is good news and bad news. 
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The bad news is that women are less likely to make strategic 
use of their friendships, viewing this as perhaps endangering or 
inimical to friendships. They also suffer from less free time than 
men: building relationships that one can benefit from often occurs 
during after-work activities that women, carrying a heavier do- 
mestic workload, participate in less often. 

The research, however, has a silver lining: quality trumps quan- 
tity. Having a few sponsors who know and believe in a person 
may well be more important than relating to many others who 
take only a vague interest in your life. And while Dobbin and 
colleagues’ work cannot establish a causal pathway between net- 
works and diversity, they find a positive relationship between 
networks and the share of women in management.’ 

But much is yet unknown. A great deal of research still needs 
to be done to unpack what kinds of networks work for whom, 
why sponsoring and mentoring work for some but not for others, 
and how leadership development training can be improved to 
maximize its impact. We know even less about training targeting 
men. In 2015, the Australian Workplace Gender Equality Agency 
started to focus on helping men navigate work and life. It produced 
a documentary following the lives of five men in senior manage- 
ment roles who felt that, as one partner in a law firm said, “Flexi- 
bility in the workplace is equally important for both genders; we all 
have families and interests outside work.” Another, a self-confessed 
workaholic and senior manager at Telstra, the country’s leading 
provider of mobile devices, phones, and broadband internet, de- 
scribed the situation as follows: “A lot of my own work habits, I’ve 
spent about 15 years building up. They’re a bit like smoking.” 

Conceptualizing work hours as an addiction offers an in- 


triguing opening for behavioral interventions. It may also serve as 
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an illustration for why training programs should tell us not just 
what to do, but also help us follow through. Consider smoking 
and the present bias that often thwarts people wishing to quit. 
Their System 2 plans to quit smoking but their System 1 wants to 
smoke—now. Their desire to quit is sincere, but quitting can wait 
until tomorrow, or even better next month. Smoking is part of a 
larger family of issues afflicted by present bias. Such present-biased 
preferences can help us understand why it is hard to quit smoking, 
eat healthily, or work out regularly. We want to have that choco- 
late now or sleep in today and defer eating an apple or going to the 
gym to another day. Humans are procrastinators, reliving the same 
experience again and again, as of course “tomorrow” never comes 
and that chocolate will taste exactly as sweet a month from now.’ 

The good news is that most of us can take inspiration from 
Ulysses. We understand that temptation exists and try to protect 
ourselves from it. Ulysses asked his crew to bind him to the mast 
of his ship so that he could listen to the Sirens’ song but could 
not submit to the temptation to follow the Sirens into peril. (His 
crew had to put wax in their ears.) Some of us follow the crew’s 
example and do not have chocolate in the house. Others have 
more intricate mechanisms and buy prepackaged rations to make 
sure that their unthinking, System 1 self eats only a set amount 
determined beforehand by their System 2 self. George Loewenstein 
refers to this reflective time as the “cold” state—but it changes to 
“hot” when we find ourselves in front of the dessert buffet at the 
next dinner party. When out with friends, it is difficult to exert 
enough self-control to say no to that additional drink, even though 
we had planned to have only one that night. 

And the same is true for work, particularly for those of us who 


love our work. While System 2 promised our spouse in the morning 
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to be home on time for dinner, System 1 tempts us at the close of 
the work day into quickly finishing that one brief. In that moment, 
you are up against the instant gratification of completing something 
specific versus the uncertainty of spending time with your family. 

If you are a teacher or trainer who wants to help people better 
balance work and nonwork activities, or if you are a workaholic, 
you may want to consider introducing self-commitment devices. 
Maybe it is the time when you have to pick up your child from 
the daycare center or meet your friend in the gym. Generally, 
choosing in advance and making an active decision have been 
demonstrated to increase the likelihood that people follow through 
on their “should” choice. For example, when new employees 
were required to make a compulsory choice about enrollment in 
a pension plan instead of opting in at their leisure, this increased 
enrollment rates by almost 30 percent. But smart precommitment 
also includes buying smaller plates and glasses to reduce calorie 
intake. A word of caution, though: precommitment can be ex- 
pensive. Paying for that yearlong gym membership may prod you 
to lift more weights or attend more spin classes but at a higher 
price. Ulrike Malmendier and Stefano Della Vigna have shown 
that gym-goers that choose long-term contracts end up with a 
per-visit cost that is 70 percent higher than what they would have 
paid if they had paid for each visit separately. 

Alternatively, you can visit the website Stickk.com. The be- 
havioral economist Dean Karlan and his colleagues wanted to 
create a mechanism that would help people stick to their plans. 
Stickk.com was their brainchild. The next time you plan to lose 
those twenty pounds, log on to their website and write a contract 
with your future self. Declare your goal and what you will do if 


you do not meet it—perhaps donate a certain amount to a charity. 
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Make sure you have a monitor controlling how well you do and 
make the price high in case your future self fails to perform. There’s 
no use in committing to paying $10 in case you have not lost the 
twenty pounds within six months. Rather, $10,000 will give you 
pause and spur you to go to the gym. In order to provide a cred- 
ible incentive, the amount must hurt. Karlan and a colleague were 
successful in losing thirty pounds over nine months, using a sim- 
ilar mechanism. If one of them did not meet the target, they had 
agreed to pay the other $10,000. They also instituted follow- 
through incentives and kept checking on each other for years. And 
there was at least one instance where one of them in fact had to 
pay the fine (which he did).'° 

Follow-through should also be an integral part of our leader- 
ship development programs. Some of the most interesting work 
on how this can be done is not from the corporate world. Rather, 
a number of nongovernmental organizations are experimenting 
with different types of leadership development programs. For ex- 
ample, several NGOs working on microfinance, having realized 
that microfinance was not able to live up to its great promise of 
lifting people out of poverty by itself, have started to couple it with 
business training.” 

While obviously not the same pair of shoes as the executive 
education programs offered by universities and the leadership de- 
velopment trainings run by consultancies in the United States and 
Europe, the NGOs’ evaluation still offers us a glimpse at the ef- 
fects such programs can have. My colleague Rohini Pande of the 
Kennedy School and collaborators evaluated a two-day business 
training program offered to about 600 women micro-entrepreneurs 
between the ages of eighteen and fifty in Ahmedabad, India, by 
SEWA (Self-Employed Women’s Association) Bank. The pro- 
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gram focused on business skills, financial literacy, and leadership. 
The training worked for some but not for all. It increased bor- 
rowing and business income for upper-class Hindus but not for 
lower-class Hindus or Muslims. Such mixed results are not un- 
usual for financial literacy training. A collaboration among more 
than one hundred researchers, the World Bank, the OECD, and 
the Russian Federation offers an excellent overview of the field 
evidence. It concludes that a broader definition of financial literacy 
is required, one that includes both training that raises awareness 
and provides knowledge and skills, as well as behavioral interven- 
tions making it easier for people to follow through on the knowl- 
edge gained. The report thus shifts from talking about “financial 
literacy training” to focusing on “financial capacity building.” 
One of the most compelling findings is to keep it simple. A 
group of researchers compared the effectiveness of a traditional 
financial education program aimed at business owners in the 
Dominican Republic with a program that focused on delivering 
simple rules of thumb. Simplicity outperformed complexity by 
far—echoing the themes of two recent, important books: Simpler 
by Cass Sunstein, and Scarcity by Sendhil Mullainathan and Eldar 
Shafir. People, and particularly poor people who are worried about 
day-to-day survival, have limited attention and cognitive capacity 
available to learn about and implement new concepts." 
Supporting some of the lessons offered by McKinsey, another 
field experiment in India further found that microfinance clients 
benefited most from targeted training that did not just focus on 
general financial education but offered individualized financial 
counseling, including helping people set personal financial goals. 
Goal setting matters. Much research suggests that establishing 


a goal increases performance. But not all goals are created equally, 
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and there appears to be some art, in addition to science, in setting 
goals right. Specific, challenging goals tend to help us focus our 
attention and make us persistent, but they are particularly likely 
to work their magic if there are not too many goals and we are 
personally committed to them. Our values need to be aligned. 
And certain goals are by definition hard to reach and it might take 
months or years to do so. For commitments to survive in the in- 
terim and allow individuals and organizations to make progress, 
behavioral insights suggest that having clear goals achievable by 
making small steps might be more important than articulating the 
difficult, sweeping goal." 

Note, however, that goal setting can come with side effects. 
In a review of the evidence, Lisa Ordóñez and collaborators iden- 
tify the neglect of non-goal-relevant activities, a potential increase 
in unethical behavior in pursuit of meeting the goal, and a reduc- 
tion of intrinsic motivation as some of the core concerns. For ex- 
ample, consider the well-known study of inattentional blindness by 
Dan Simons and Chris Chabris showing that when people are given 
a clear goal, namely to count the number of passes in a video of a ball 
game, they fail to see a man wearing a black gorilla suit walking 
across the screen (he also stops in the middle of the screen and 
pounds his chest!). Thus, when using goals to motivate behavior, 
we need to do so with great care and eyes wide open for potential 
side effects.’ 

But goal setting can help bridge the intention-action gap. Mi- 
crofinance clients intend to invest their loans wisely, but using the 
money to meet immediate needs today is tempting. Similarly, the 
participants in your training programs today might well mean to 
follow through on their virtuous intentions and implement what 
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they have learned when they are back in their offices—but then, 
life takes over and great intentions remain exactly that, great in- 
tentions. To be able to meet our goals, we need to make plans on 
how to get there—much like the German companies that were 
asked to submit their plans on how to increase gender diversity 
in 2015. When persons or organizations are asked to make a plan 
for when, where, and how they will reach a certain goal, the plan 
can serve as a commitment device: a psychological contract that 
they write with themselves. Randomized controlled trials have 
shown that plan making can increase the likelihood that people 
follow through on their intention to vote, to exercise, and to 
get their flu shot. Plan making also helps people meet deadlines— 
quite relevant for someone working on a book.'® 

Feedback from others also matters, and that is where networks 
can come in handy again. A field experiment was set up in Uganda 
to evaluate how to best teach female cotton farmers proper growing 
techniques. A standard training program offered to both men and 
women did substantially worse than a program targeting women 
and establishing social ties among them. In the treatment group 
consisting of women only, each participant was randomly paired 
with a partner she did not know beforehand. The partners not only 
learned together during the program, but they also kept checking 
on each other throughout the season. This “buddy” system turned 
out to increase the productivity of most farmers by about 60 percent 
as compared to a 40 percent increase for those who had partici- 
pated in the traditional training program only. Similar strong evi- 
dence on the importance of networks comes from randomized 
controlled trials on microfinance in India. When first-time bor- 
rowers met more often and forged stronger social ties, they were 
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more likely to cooperate with each other later on, suggesting that 
group lending might work particularly well because it links people 
to each other in a new social network. 

Social networks provide peer monitoring. Emily Breza of 
Columbia University and Arun Chandrasekhar of Stanford Uni- 
versity show that helping households open an account and devise 
savings goals, coupled with regular visits to check on progress, has 
positive but only relatively modest effects on people’s saving bal- 
ances, increasing them by about 10 percent in India. When a peer 
monitor from a person’s network is added, however, total savings 
increase by 34 percent. Interestingly, monitors are particularly 
effective when they are not chosen by the savers themselves but 
instead are randomly assigned. People need powerful monitors 
with a central position in their network. Peripheral network 
members do not have the social capital to request such a person on 
their own.” 

All the available evidence points to the importance of training 
programs that go beyond educating people to building their ca- 
pacity as well. Learning how to do something is different, and less 
desirable, than being supported in how you are achieving some- 
thing. These lessons should be incorporated in training programs 
globally, whether in Uganda or in the United States, and whether 
your goal is gender equality or higher agricultural yields. In an 
executive program for the World Economic Forum’s Young Global 
Leaders that I chair at the Kennedy School, for example, we assign 
participants to small leadership development groups at the begin- 
ning of the program. They then meet every morning before class 
to work through a leadership curriculum developed by my col- 
league, Bill George, of Harvard Business School. At the end of 


the almost two-week-long program, they share insights these small 


Getting Help Only Takes You So Far 99 


groups gained with the larger class. More importantly, they are 
encouraged to continue meeting, most often virtually, and to form 
a support group, which has proven invaluable for many. These 
highly accomplished men and women, all already in or seeking to 
join organizations of influence, whether in government or busi- 
ness, end up benefiting from the same framework for feedback as 
do Indian microfinance clients and Ugandan farmers. 

To build leadership capacity, leadership training programs tar- 
geting women need to move beyond helping them navigate the 
existing playing field to more sustained interventions that can 
eventually redesign the field. Mentoring, sponsorship, and net- 
working initiatives are a first step in that direction. They provide 
some of the knowledge and skills taught in leadership programs. 
In addition, they can have enduring effects by taking root in or- 
ganizations and serving as commitment devices that help people 
follow through on what they have learned and on the goals they 
and the organization have set. But more systemic interventions are 
required to de-bias the system. And that is what we have to push 


for: redesign the environments in which we work, learn, and live. 


Designing Gender Equality—Build Capacity 


¢ Stop showering women (and men) with generic leader- 
ship development training. 

e Build leadership capacity by supporting people with the 
resources required for success, including mentors or 
sponsors and networks. 

e Use behavioral design to help people follow through, 
with actions such as plan making, goal setting, and 
feedback. 


Part Two 


HOW TO DESIGN 
TALENT MANAGEMENT 


5 


Applying Data to People Decisions 


“What does not get measured does not count,” a saying goes. Even 
more important, though, is the truism “What does not get mea- 
sured cannot be fixed.” Any organization that hopes to learn and 
improve needs to base its decisions on evidence. This is particu- 
larly true when confronting problems that are the result of system- 
atic unconscious bias. It also explains why a focused, data-driven 
effort to solve gender inequality can yield a double reward. By 
bringing its consequences to light, gender inequality can help or- 
ganizations both do the right thing and invest resources in those 
policies, organizational practices, and structures that yield the 
highest returns. Before you can do any of that, however, you need 
to know what is broken. And to measure that, we are armed with 
not only new knowledge about how the mind works, but new 
tools to assess consequences. When it comes to improving our 
people decisions, few new tools promise to revolutionize human 
resource management as thoroughly as people analytics. It also holds 
out the promise of informing new designs to address gender 


inequality. 
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The US workforce consists of more than 150 million people. 
More than 220 million are in the labor force in the European 
Union. In India, corresponding estimates suggest a labor force of 
more than 480 million and in China of almost 800 million. Glob- 
ally, more than 3 billion people work or seek work. For the last 
thirty to forty years, we have been collecting a lot of information 
about these people: what they do, where they work, where they 
went to school, what their demographic characteristics are, how 
they perform and, maybe, even how much money they make. But 
only now are we starting to use this data to improve our people 
decisions.” 

In its simplest form, people analytics collects large amounts of 
data and uses complex applications to measure relationships be- 
tween variables and detect patterns and trends. For example, was 
your company right in assuming that graduates from the best col- 
leges make the best analysts or salespeople or programmers? Maybe 
a degree from Harvard is highly correlated with job performance, 
or maybe not. Perhaps where an employee received her secondary 
education matters for some jobs, but not for others. You won’t 
know unless you measure. 

Data analytics has already been applied to combat crime, help 
prevent and manage natural disasters, improve health care, and 
make economies more productive. It has even been credited with 
helping people get elected. President Obama’s campaigns were the 
first to make systematic use of big data, rather than the usual “gu- 
rulike intuition,” to better understand how to mobilize support. 
This is far from just an American phenomenon. One reporter 
summed up the 2014 Indian elections this way: “It’s no exaggera- 
tion to say that Big Data analytics, the process of capturing, man- 


aging and analysing massive amounts of data to generate useful 
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information, was in part responsible in helping the Bharatiya Janata 
Party (BJP) and its allies secure the biggest election victory in more 
than three decades.”° 

Today many large companies use data analytics to better pre- 
dict market trends, manage risk, measure customer needs, create 
improved customer experiences, optimize supply chains, and mon- 
itor compliance. Only a few, however, have started to apply data 
analytics to improve their people practices. One of them is Google. 

Calling its Human Resources department “People Operations,” 
Google has been at the forefront of this trend. The data told Google, 
for example, that an apparent gender gap—women were twice as 
likely to quit as the average Google employee—was in fact a 
“parent gap.” Young mothers, it turned out, were twice as likely 
to quit. So Laszlo Bock, the head of the department, introduced a 
new maternity and paternity leave plan. Instead of the industry 
standard of twelve weeks, new mothers could take five months off 
and new parents seven weeks. The impact was immediate: new 
mothers at Google are now no more likely to leave than the 
average employee.* 

Turnover is a big concern for many companies. It is expensive 
to find, recruit, train, and retain talent. Relying on big data, 
Google figured out how to best predict people’s likelihood of 
leaving. They now use five diagnostic questions most predictive 
of employees’ quitting. If the aggregate answers to these questions 
come back below 70 percent positive, Google knows it has to take 
action, otherwise the analytics make clear that people are very 
likely to leave the following year. The responses to the five ques- 
tions allow Google to identify the issues costing them employees 
and target its interventions accordingly—without relating them to 


any specific individual. 
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Google’s HR department has been compared to an employee 
science lab. Data are tracked and experiments are run constantly 
to optimize Google’s procedures. For example, a few years ago, 
Google ran an analysis to determine the optimal number of job 
interviews. Todd Carlisle, director of staffing, collected all inter- 
view scores a candidate had received from the various people he 
or she had met. Repeating this for many candidates, he found that 
the optimal number of evaluators was four—well below the typ- 
ical number of interviewers Google had been using. But the evi- 
dence was irrefutable: four independent assessments were enough 
for the candidate’s average score to converge to a final score. Con- 
sequently, Google significantly cut back its interview times. 

Google’s People Operations has examined questions ranging 
from how to maximize employee happiness—higher salary, a 
bonus, or more time off?—to how to help employees save for 
retirement. Often collaborating with academics, Google makes 
many of its research findings public. Consequently, we know 
that the predictability that comes with salary increases makes 
“Googlers” happier than the possibility of a windfall bonus. Re- 
muneration isn’t the only thing that makes people happy. Research 
by Elizabeth Dunn and Mike Norton, for example, shows that 
companies can make their employees happier by letting them de- 
cide which charity receives corporate philanthropy. And if you 
want to motivate people to save more, you should remind them 
often and set goals, along with making participation in company 
retirement programs the default. Capturing the ambition behind 
all of these efforts, Prasad Setty, who leads the “people analytics” 
group within People Operations at Google, said: “We make thou- 
sands of people decisions every day—who we should hire, how 


much we should pay them, who we should promote, who we 
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should let go of. What we try to do is bring the same level of rigor 
to people decisions that we do to engineering decisions. Our mis- 
sion is to have all people decisions be informed by data.”® 

Organizations should follow in Google’s footsteps. But this is 
easier said than done. A high-level partner in one company I 
worked with told me that his organization could never use the 
word “experiment.” Doing so, he said, would suggest managers 
didn’t know what they were doing. That, I told him, was exactly 
the point! People think they know what they are doing—based 
on a mixture of intuition, best practice, tradition, and industry 
norms. But only evidence can tell. Randomized controlled trials 
are the gold standard of evidence in medicine, the sciences, and 
increasingly in economics, sociology, and psychology (which has 
been employing the experimental technique in the laboratory 
for a long time). We all are thankful that the drugs we take to 
combat a migraine or lower our blood pressure have been tested 
in clinical trials, with treatment and control groups. Not only can 
organizations avail themselves of the same techniques, allowing 
them to fine-tune what works and design processes that lead to 
better people decisions, it is increasingly damning when they 
don’t bother. 

The power of data was jarringly brought to my attention by 
the students at Harvard Kennedy School. One day, I came to my 
academic dean’s office to find a group of them camped out in front 
of my door. They needed to see me urgently, they said. So we 
met. They were concerned about the lack of women faculty. This 
was not a new concern, but given how much of my recent re- 
search had focused on how to equalize the playing field I found 
myself in an odd position. Even as I explained the progress we had 


made, I felt defensive. By statistical necessity, I explained, change 
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was slow: we only hired about five new faculty members a year 
on average. I remember running impromptu calculations during 
the meeting, figuring out how long it would take to reach gender 
parity if we hired only women. A long time. 

So I dug deeper. Much to my surprise, I realized that it was 
not primarily the number of female faculty or some abstract con- 
cept of gender equality among our faculty that concerned these 
students. Rather, it was the lack of role models for female students. 
They did not care that much about faculty statistics, but they 
wanted to see more women leaders—in the classroom, in semi- 
nars, at conferences, on panels, behind the podium, teaching, 
speaking, researching, tutoring, and advising. It turns out we had 
never paid attention to the gender breakdown of the people vis- 
iting the Kennedy School. On any given day, there are a multi- 
tude of talks and discussions taking place across campus, typically 
with a lead speaker or lead panelists. These experts—political, civil 
society, business, and academic leaders from around the world— 
were invited by different individuals, research centers, programs, 
institutes, and study groups. Some of those invited to campus visit 
for their presentations only and others stay at the school as visiting 
fellows for a year or longer. Many of them interact with our stu- 
dents, and all of them help complete the picture that represents 
Harvard. And we had never before collected their demographics. 

So we did. We asked the sponsoring institutions and entities 
to include in their annual reports to the deans a gender breakdown 
of their guests. And our findings resembled those of most organi- 
zations that collect such data for the first time: the numbers were 
not pretty. However, they initiated a healthy discussion and some 
self-reflection starting in the first year we measured. And, of 


course, they have since allowed us to track change over time, de- 
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sign strategies based on the evidence—many of which are discussed 
in this bbok—and compare notes on what worked. 

The potential of data analytics to help close gender gaps is enor- 
mous. After collecting and analyzing large amounts of data in 
the late nineties, the sociologist Janice Fanning Madden of the 
Wharton School found that female stockbrokers in two of the 
largest US stockbrokerage firms earned about 60 percent of what 
their male colleagues made. The stockbrokers received com- 
missions from the sales of securities to their clients. Thus, the 
theory went, the female brokers made less money because they 
sold less. Without looking any deeper into these results, the easy 
assumption became this: women weren't as good as their male 
counterparts. But it turns out the women did not perform worse. 
How was it possible that they were paid less? The data told the 
tale. 

The female brokers were treated differently. They were given 
inferior accounts and sales opportunities. Madden refers to this as 
performance support bias. Neither she nor the firms would have been 
able to deduce this if she had not had access to the stockbrokers’ 
personnel histories, the firm’s trading and asset records, as well as 
information on each broker’s management of accounts. With this 
information she was able to carefully analyze the data to tease apart 
the various competing hypotheses, which included the theory that 
female brokers were less productive than men, with reasons cited 
being innate gender differences, earlier career discrimination, 
and consumer reluctance to work with female stockbrokers. 
None of these factors turned out to matter. Women were given 
worse-performing accounts to start with. In fact, when women 
were given more valuable accounts, the gender gap in perfor- 


mance disappeared. 
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We would not know if we had not measured, and the two bro- 
kerage firms would not have been able to fix what was both an 
inequitable and an inefficient allocation system of accounts. In this 
instance we know only because class action sex discrimination 
lawsuits filed against the two firms forced them to make their data 
available to Madden, who served as an expert witness on the case. 
It isn’t just that what does not get measured doesn’t count; it’s 
worse. Set aside the substantial financial and reputational costs of 
a lawsuit. Because the best brokers weren't being identified and 
encouraged to be as productive as they could be, the firms served 
clients suboptimally and were less successful than they could have 
been.° 

Here’s the good news. Once you collect and study the data, 
you can measure progress. In 1999, MIT acknowledged that it had 
unintentionally discriminated against women. Charles M. Vest, 
then president of MIT, wrote in a preface to a report issued that 
year: “I have always believed that contemporary gender discrimi- 
nation within universities is part reality and part perception. True, 
but I now understand that reality is by far the greater part of the 
balance.” 

Led by Nancy Hopkins, a professor of biology, an examination 
of data had revealed gender differences in salary, space, resources, 
awards, and responses to outside offers. Women faculty were being 
treated significantly worse than their as-accomplished male col- 
leagues. Nancy and her colleagues’ work incited a debate about 
gender inequality in academia across the United States, with some 
critics questioning the quality of the data and the subsequent anal- 
yses. MIT stood by its report and, indeed, its use of data. Then 
dean of the School of Science, Robert Birgeneau, said bluntly: “It 
was data-driven, and that’s a very MIT thing.” 
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That data had real consequences. A follow-up study, published 
in 2011, showed that the number of women faculty in science and 
engineering had almost doubled, and several women held senior 
leadership positions. Inequities in resource allocation and salaries 
had been rectified. Learning from data, it turns out, is a very MIT 
thing, too. MIT is hardly alone. The Swiss government, for ex- 
ample, has developed an online tool called Logib, enabling com- 
panies to measure how well they are doing in terms of gender 
pay equity. More elaborate evaluation tools are now offered by a 
number of private providers and NGOs. I serve on the scientific 
council of one such provider. A few years ago, Nicole Schwab, one 
of the two cofounders and a former student of mine, came to see 
me to discuss a new idea, the Gender Equality Project. The dis- 
cussion turned into advisory work in which we helped develop 
an evaluation and certification tool that allows companies to mea- 
sure progress toward closing gender gaps in their organizations by 
assessing both outcome variables related to pay, recruitment and 
promotion, as well as input variables such as training, mentoring, 
and company policies and practices. 

What started as a “project” has matured into EDGE, a private 
company and foundation led by the other cofounder, Aniela 
Unguresan. Companies that have met a global standard for work- 
place gender equality are EDGE certified. In March 2015, Jim Yong 
Kim, the president of the World Bank, announced that his organ- 
ization would seek EDGE certification, joining already certified 
companies such as Banco Compartamos Mexico, CEPD NV. 
Poland, Deloitte Switzerland, and LOreal USA, among others. 
In celebration of International Women’s Day, Kim acknowledged 
that the World Bank could only be credible in its efforts to close 
gender gaps around the world if it walked the talk within its own 
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organization. The EDGE-collected data allows the World Bank 
to uncover patterns in its management of human resources, better 
understand the dynamics of gender at work within its walls, and 
identify hot spots ripe for intervention." 

These are hindsight insights, using data to discover how cur- 
rent practices are influencing decisions and productivity. Predic- 
tive analytics, however, can also fundamentally change how we 
evaluate candidates in hiring, promotion, and performance ap- 
praisals, substantially decreasing the role that intuition has tradi- 
tionally played. 

Don’t misunderstand me. Subjective performance appraisals are 
here to stay. Very few organizations can do away with them. Yet 
a multitude of studies have shown that the discretion they afford 
supervisors in evaluating their subordinates opens the door to all 
kinds of biases. As discussed earlier, pro-male bias for male-typed 
jobs and pro-female bias for female-typed jobs is prevalent, as is a 
pro-male bias for positions of leadership and authority. What is 
even more surprising, even when there are no gender differences 
in performance appraisals, identical ratings were found to be more 
likely to translate into promotions for men than for women. The 
sociologist Emilio Castilla of MIT found this performance-reward bias 
among employees of a large service organization. After having 
been given the same performance evaluation score as white male 
employees, women and other traditionally disadvantaged groups 
received lower pay increases. The finding was particularly striking 
as Castilla uncovered the bias just when the organization declared 
itself committed to creating a culture of meritocracy by explicitly 
linking rewards to performance. 

To further explore what he later called the “paradox of meri- 


tocracy,” Castilla and collaborators ran a number of follow-up ex- 
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periments to examine whether emphasizing meritocracy could 
backfire, leading to an increase in biased performance rewards. 
Reminiscent of the licensing effect discussed earlier, the answer, 
sadly, was yes. Merit-based reward practices led to greater male 
favoritism as compared to a non-meritocratic work environment. 
Why is not entirely clear. Research has shown that when people 
are primed to feel objective—being offered the opportunity to dis- 
agree with sexist statements, for example—they afterward prove 
more likely to prefer male over female job candidates. Primed by 
a company’s declared sense of meritocracy, evaluators may have 
felt licensed to act on their biased intuitions. Alternatively, as evi- 
denced by the findings of another study, managers might treat men 
more favorably to avoid tough conversations. Just as women were 
not expected to negotiate and reviewed unfavorably when they 
did so, managers expected women to be more accepting of devia- 
tions from their performance ratings.’ 

The performance-reward bias is a substantial challenge for or- 
ganizations interested in paying for performance. Compensation 
and promotion committees often add information not reflected in 
an evaluation score but still deemed relevant for compensation or 
promotion decisions. One of the most common additional con- 
siderations is a person’s judged potential. Having worked with 
companies on their performance evaluations and compensation 
systems, I have found a widespread gender potential bias, in partic- 
ular in male-dominated industries and work groups. And, like all 
biases, it is difficult to debunk. 

Biased judgments are costly—for the employee as well as for 
the employer. And they affect women across industries, including 
in female-dominated sectors. According to a 2015 study published 


in the Journal of the American Medical Association, male registered 
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nurses in the United States, who made up 7 percent of those in 
the profession, had higher salaries than the other 93 percent of 
nurses, who happened to be female. The wage gap applied across 
settings, specialties (with one exception, orthopedics), and posi- 
tions and did not significantly change over twenty-five years (from 
1988 to 2013). 

The gender pay gap is prevalent around the world, although the 
size of the gap varies by country, sector, and specific methodology 
used to assess it. The US Department of Labor reported a gender 
pay gap of 21 percent in 2014, where the gap represents the differ- 
ence in men’s and women’s median earnings. The pay gap in 2014 
was largest in Louisiana, where women were paid 65 percent of 
what men were paid, and smallest in the District of Columbia with 
an earnings ratio of 90 percent. In the European Union overall, the 
gender pay gap was 16.4 percent in 2013, and the gap represents the 
difference between average gross hourly earnings of male employees 
and of female employees. The country in the EU with the biggest 
gender pay gap, 29.9 percent, is Estonia, and the country with the 
smallest gender pay gap is Slovenia, with 3.2 percent. In terms of the 
OECD countries, overall, the gender pay gap was 15.5 percent in 
2014, representing the difference between men’s and women’s 
median earnings. The OECD country with the biggest gender 
wage gap was Korea, with 36.6 percent, and the country with the 
smallest gender wage gap was New Zealand, with 5.6 percent." 

Not all of the gap is due to discrimination. My own discipline, 
economics, typically only deems someone to be discriminated 
against if people with the same qualifications and productivity in 
the same occupation are treated and compensated differently. This 
narrow definition does not capture earlier discrimination in ac- 


cess to education or skills. But even then, there remains a sizable 
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“unexplained residual” that cannot be explained by any other 
characteristic than gender. The gender earnings gap also hides the 
fact that neither all women nor all men are treated equally. Race, 
ethnicity, geography, and sexual orientation, among other char- 
acteristics, all matter. For example, gay men have been shown to 
earn less and lesbian women to earn more than heterosexual men 
and women, respectively. In addition, both motherhood penalties 
and fatherhood premiums are well documented.” 

Gender bias also hurts employers. Employees who feel discrim- 
inated against are less motivated to work and more likely to quit. 
Favoritism also sets the wrong incentives—indeed, it depresses 
effort among the discriminated and among the favored. The former 
know that there is no point and the latter know that there is no 
need. Favoritism sets arbitrary incentives and leads to the wrong 
people being promoted or assigned to jobs they are unqualified 
for. In addition, subordinates might try to influence their supe- 
riors to obtain better ratings and waste productive time lobbying 
for the desired outcomes. This is particularly true the more impor- 
tant incentive pay is. Salaried employees have little room to ma- 
neuver, but if money is on the line and, for example, the bonus 
makes for a sizable fraction of a person’s compensation, the super- 
visor has real power to affect subordinates’ welfare. These effects 
can be long lasting. A meta-analysis concludes that performance 
appraisals can turn into self-fulfilling prophecies. The effects 
appear to be particularly pronounced for men in the military 
and for people for whom low expectations were held from the 
beginning.” 

Most managers like their discretion. I know that remains true 
for me. But, while I believe that I have some information about 


my team’s performance that would likely not be captured by a 


116 HOW TO DESIGN TALENT MANAGEMENT 


formal evaluation system, I am keenly aware that it is almost im- 
possible for me to be objective. There are just too many biases for 
me, or anyone, to monitor and fight them all. Thus, the real chal- 
lenge organizations face is how to design evaluation and compen- 
sation procedures that balance the costs of supervisor bias with the 
benefits of an informed supervisor’s discretion. 

One small but surprising idea is to experiment with a behav- 
ioral intervention proven to increase ethicality in other domains. 
Lisa Shu and colleagues show that people behave more ethically 
when they sign a form before filling it out (instead of afterward, 
as is customary). Making morality salient before people are tempted 
to understate income, misrepresent expenses, or play favorites 
focuses the mind on honesty in a different way than signing after 
reporting, when it is tempting to come up with ex-post justifica- 
tions. Perhaps we can decrease favoritism in performance appraisals, 
nudging supervisors toward honesty by having them sign first? An 
experiment to be run." 

Another solution is to hold supervisors accountable. Few firms 
track the impact of performance evaluations on job assignments, 
promotions, and the employee’s subsequent performance. Super- 
visors should know that their assessments matter, not just to the 
employee but also to the firm. Favoritism is unjust and costly, and 
should be made costly for the supervisor as well. If supervisors are 
responsible for the promotions in their department, part of their 
compensation should depend on the department’s performance, 
providing them immediate incentives to promote the most talented 
rather than the most favored employee. If promoted or recom- 
mended people are assigned elsewhere, more creative mechanisms 


need to be put in place. Analytics can help track and compare such 
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employees with large numbers of others, controlling for many 
other variables that might also affect performance. 

One study, for example, examined data of more than 8,000 
employees in a financial-sector firm, finding a larger gender gap 
in bonuses and variable pay than in base salary or merit raises after 
controlling for performance. The former were distributed without 
formal rules, the latter were subject to formal rules. Generally, 
formulaic approaches measuring individual performance to deter- 
mine compensation work better for women. Ideally, firms mea- 
sure their performance relative to the performance of appropriate 
comparators to make sure they do not reward or punish for swings 
in the market or other idiosyncrasies specific to a firm or a team. 
Google explicitly uses such comparisons to help its supervisors 
calibrate their performance ratings and guard against bias. After 
having assessed their team members, managers meet to compare 
assessments across groups." 

Some firms employ rankings and curves to force supervisors 
to calibrate and be discerning among their subordinates. One 
ranking I have come across when working with a company was 
based on a three-point scale and had managers assign the best rating 
to 20 percent of their employees and the worst rating to 10 percent, 
with the remaining 70 percent being clustered in the middle. But 
there is nothing magic about this particular scale. In fact, the evi- 
dence on rating and ranking systems in performance management 
is not conclusive, and their impact on male and female employees 
is understudied.’ 

One of the few exceptions is the economist Iwan Barankay, 
who, working with a large office furniture company, set out to 


measure the impact of different evaluation systems on men and 
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women. The company’s salespeople were paid based on their ab- 
solute performance, measured by the value of their sales. The 
salespeople, about half women and half men, recorded their sales 
and learned about their commission rates and earnings on a pri- 
vate webpage. In the treatment group, salespeople were informed 
of their rank, based on the sales they had made to date, compared 
to their colleagues. It was kept private, with their ranking ap- 
pearing next to their name on their personalized webpage. Also, 
their ranking did not affect their pay. Still, it influenced salespeo- 
ple’s performance—at least of some of them. Showing employees 
their rank decreased men’s performance but did not affect women. 
Saleswomen, it turned out, cared less about their rank and, in con- 
trast to their male colleagues, were not demoralized by knowing 
that they had received a lower rank than expected. 

Independent of its gender effects, the evidence on the impact 
of sharing performance information remains unclear, however. 
Covering more than 13,000 subjects across some 130 studies in 
psychology, an early meta-analysis revealed mixed results. In about 
two-thirds of the studies, feedback increased performance, and in 
the other third, it decreased it. Many details seem to matter, which 
is why it is important for companies to collect their own data and 
begin collecting as soon as possible.” 

For all its promise, and as much as I urge companies and orga- 
nizations to embrace it, people analytics is not a first-best solu- 
tion. Data collection has the potential to invade our privacy. It also 
opens the door to categorical judgments based on demographic 
characteristics. In addition, there are methodological concerns. It 
is tempting to make causal inferences based on correlations, which 
is wrong. But now that department stores can accurately predict a 


customer’s pregnancy based on her shopping behavior (and inform 
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her family with marketing materials before she is ready to share 
the news), it is safe to say that the influence of big data, for good 
or ill, is not going away.'® 

For most jobs, subjective performance appraisals will remain 
relevant—and this is where even a second-best approach can make 
things better. We now know a tremendous amount about the in- 
fluence of unconscious biases on our intuitive judgments. At first 
blush, female stockbrokers seemed to underperform male stock- 
brokers, but in fact the unconscious biases of superiors were 
responsible for their assignment to weaker accounts—and clients, 
brokers, and firms suffered. People analytics can help us check the 
intuitive associations we make. 

In many ways, people analytics is reminiscent of an earlier 
discussion that took place mainly among psychologists more than 
half a century ago. In his 1954 book, Clinical vs. Statistical Predic- 
tion: A Theoretical Analysis and a Review of the Evidence, Paul Meehl 
concluded that simple statistical algorithms tend to beat the pre- 
dictions of experts. Many more experiments pitting humans 
against machines followed, providing strong support for the early 
findings. Simple models, often using just a handful of variables 
and linear specifications, were shown to outperform profes- 
sionals’ judgments in business, ranging from estimating the like- 
lihood of success of new businesses to the career satisfaction of 
employees; in public policy, predicting recidivism among crimi- 
nals; and in medicine, diagnosing diseases and survival proba- 
bilities, among others." 

Over the course of twenty years, Philip Tetlock invited almost 
300 experts—economists, political scientists, and policy- and other 
decisionmakers—to make thousands of predictions that he com- 


pared with actual outcomes. Topics ranged from economic 
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performance to interstate violence to nuclear proliferation. Sober- 
ingly, the experts did rather poorly, not much better than “dart- 
throwing chimps,” Tetlock writes in his book Expert Political Judg- 
ment. They slightly outperformed Berkeley undergraduates, and 
did significantly worse than an algorithm extrapolating from the 
past and predicting more of the same for the future. The cumu- 
lative evidence is starting to have an impact. Moneyball, the 2004 
best seller by Michael Lewis, demonstrated that even in sports it 
was better for scouts to entrust a machine with deciding which 
players to acquire. Developed by a Harvard graduate, the algo- 
rithm led the Oakland A’s to 103 wins for the season, one of the 
best records in baseball. In the meantime, almost all major leagues 
have replaced human intuition with formulas, and other sports 
have followed suit.*° 

But despite the preponderance of the evidence, then and now, 
people remain skeptical of algorithmic judgment. A review of the 
evidence suggests that the instances when human judgment out- 
performs algorithms are extremely rare and typically involve sit- 
uations where people have important information that the machine 
does not. But even then, challenges remain, for people cannot 
distinguish between beneficial adjustments due to information 
asymmetries and harmful adjustments due to overconfidence. 

A group of researchers from the Wharton School documents 
how widespread is what they refer to as algorithm aversion. Across 
firms and forecasting domains (including banking, manufacturing, 
and beauty), professionals either did not use algorithms at all or 
placed too little weight on them. Strangely, people became even 
more averse to algorithmic forecasts after they saw them outper- 
form human forecasters. Across five studies where participants 


either observed forecasts made by an algorithm, a human, both, 
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or neither, those having seen the algorithm perform became less 
confident in it and less likely to prefer even a more accurate algo- 
rithm over the inferior human. We do not forgive an algorithm when 
it makes errors, even when it makes significantly fewer errors 
than humans do. 

Digging deeper, the researchers learned that people think that 
while algorithms might well outperform human judgment on av- 
erage, the algorithms are perceived as being unable to learn from 
their mistakes or respond to unusual events. Such agility of thought 
and ability to update based on past experiences is ascribed to 
humans—despite much evidence showing that when humans, 


“ec 


drawing on their own observations, “correct” algorithms, they 
often make the algorithms less accurate. But could these adjust- 
ment costs be outweighed by the benefits people derive from being 
involved? Maybe giving people some control over algorithms 
might decrease their aversion, make them more willing to use 
them, and consequently improve forecasts. 

It turns out we can design our way out of algorithm aversion. 
To test this, the researchers asked participants to predict students’ 
performance on a number of tests based on some background 
information, for example, highest degree earned, number of friends 
not going to college, favorite school subjects, or whether or not 
they had taken any advanced placement tests. Study participants 
were paid for the accuracy of their predictions. Allowing people 
to “correct” an algorithm’s forecasts decreased their aversion and 
made them more likely to use it. Interestingly enough, people do 
not appear to require a whole lot of leeway. Rather, the tiniest 
increase in their ability to make adjustments makes them more 
satisfied with the process and helps them remain confident in the 


algorithm even after it errs. Thankfully, a small degree of freedom 
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is enough to substantially reduce their skepticism and let the 
algorithm do its work and—as the data would have forecast— 
outperform the human mind.” 

The message of this chapter is straightforward: we should make 
more use of data and data analysis. We should use big data to un- 
derstand whether there are gender gaps in pay or promotion in our 
organizations, diagnose why, and inform interventions designed 
to close the gaps. And we should employ experiments to evaluate 
whether or not our interventions work. Big data can also be used 
to better predict which employees and students will be most likely 
to succeed in our organizations and schools—and under which 
conditions. Algorithms will help. We can overcome our aversion 
to them by giving people the opportunity to intervene and adjust, 
ideally not too much lest we decrease the accuracy of the algo- 
rithms, but enough for people to be willing to use them. 


Designing Gender Equality—Apply Data to 
People Decisions 


e Collect, track, and analyze data to understand patterns 
and trends and make forecasts. 

e Measure to detect what is broken and refine interven- 
tions. Experiment to learn what works. 

e Give people some leeway to adjust algorithmic 
judgments. 


6 


Orchestrating Smarter 
Evaluation Procedures 


Singapore’s Ministry of Manpower (MOM) was owed money. 
Failing to comply with the law, employers of foreign domestic 
workers were not paying MOM their levies (temporary taxes). The 
ministry could have gone about trying to get its money in a number 
of ways. It decided to design a simple, inexpensive intervention 
that relied on the power of pink. In a field experiment, it sent half 
of the defaulting employers the usual reminder letter on white 
paper. The other half received pink letters, which included other 
design improvements. It worked. A far greater percentage of em- 
ployers receiving the pink letter paid up. 

There is nothing magical about the color pink. In Singapore, 
however, it is a powerful internal referent. Chew Ee Tien of the 
Behavioural Insights and Design Unit at MOM explains: “Mo- 
bile and utilities companies often print their late letter notices on 
pink paper. As such, the colour reinforces the message that pay- 
ment is late.”! 

When designing solutions, the smallest details matter. The 
MOM design was influenced by one of the core insights of 
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behavioral science: judgments are comparative. Whereas white 
paper recalls typical, undifferentiated mail, pink paper recalls a 
comparison to other late payment reminders. White stationery 
piles up in a way that pink stationery does not. 

It is almost impossible for us to evaluate anything in absolute 
terms. Whether or not you like a particular cup of coffee has some- 
thing to do with the type of coffee you typically drink. Whether 
or not you find yourself shivering or dying from heat in a confer- 
ence room has something to do with the kinds of temperatures you 
are used to. Europeans visiting the United States tend to freeze in 
our conference rooms and restaurants. And Americans squirm and 
shed layers in the steamy boardrooms of Europe. Similarly, when 
we evaluate people, we instinctively compare them with others. 

Consider the following problem: Linda is thirty-one years 
old, single, outspoken, and very bright. She majored in philos- 
ophy. As a student, she was deeply concerned with issues of 
discrimination and social justice, and also participated in anti- 
nuclear demonstrations. 

Now, based on the above description, rank the following state- 


ments about Linda, from most to least likely: 


a. Linda is an insurance salesperson. 
b. Linda is a bank teller. 
c. Linda is a bank teller and is active in the feminist 


movement. 


If you are like most people, you think that Linda is most likely “a 
bank teller and is active in the feminist movement.” Linda, the 
feminist bank teller, fits best with how she was described, even 
though we may have some lingering doubts about her career as a 
banker. 
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This is one of the most widely used tests that help people ex- 
perience the representativeness heuristic we encountered earlier 
when thinking about who the typical Florida resident is (re- 
member: not the elderly). It shows how our intuitions can lead us 
astray. Recall that you were asked to rank, from most to least likely, 
statements about Linda. It cannot be that the category “bank teller 
and active in the feminist movement” is more likely than a cat- 
egory, “bank teller,” that subsumes it. A feminist bank teller 
automatically is also a bank teller—but not every bank teller, of 
course, is a feminist bank teller. 

While logic dictates that there must be more bank tellers than 
feminist bank tellers, when contemplating Linda that point is not 
intuitive. Linda looks more like a feminist than a bank teller and 
thus, when we add a representative piece about her—feminist— 
to an unrepresentative piece—banker—she becomes more alive. 
Linda’s description as a feminist bank teller is psychologically more 
appealing, albeit less probable, than her being just a bank teller. 
When economists Sendhil Mullainathan of Harvard University 
and Marianne Bertrand of the University of Chicago sent compa- 
nies (fictitious) resumes, Lakisha Washington and Jamal Jones re- 
ceived fewer callbacks than the otherwise identical Emily Walsh 
and Greg Baker. Names that sounded white got 50 percent more 
callbacks than the African-American sounding names; indeed, 
Lakisha and Jamal needed eight more years of relevant work ex- 
perience to gain equivalent attention. Across four occupational 
categories—sales, administrative support, clerical services, and cus- 
tomer services—Lakisha and Jamal were consistently perceived as 
the black salespersons or black assistants, while Emily and Greg 
were just salespersons or assistants. Just as Linda was a feminist first, 


for a clear majority of employers, when hearing Lakisha’s and 
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Jamal’s names, the internal referent is a black person, first and 
foremost.” 

Together with Max Bazerman and Alexandra van Geen, I set 
out to find an intervention that would debunk the internal ref- 
erent. In retrospect, the idea seems quite simple. We introduced 
an explicit comparator. Put differently, we wanted to confront 
evaluators with two applications at the same time. This would help 
them, we hoped, to explicitly compare job applicants rather than 
to implicitly judge them based on the internal referent. 

This is how the experiment worked. Study participants had to 
hire a candidate for either a stereotypically male task, a math 
problem; or a stereotypically female task, a word assignment. They 
were paid based on their chosen candidate’s performance. In the 
control condition, evaluators were informed of one single candi- 
date’s past performance and his or her sex (plus a number of filler 
characteristics that were identical for all candidates; for example, 
that they all came from the greater Boston area). In the treatment 
condition, in addition to the sole candidate introduced in the 
control condition, we added information on one additional candi- 
date. Control condition evaluators had to decide whether to go 
with the candidate presented to them or be assigned a candidate at 
random, pulled from the pool of candidates. In the treatment con- 
dition, evaluators had to decide whether to go with one of the two 
candidates presented to them or draw from the pool. They were 
informed of how well the candidates in the pool had done on av- 
erage, thus they knew what to expect by going back to the pool. 

The experiment was designed to mimic real hiring and pro- 
motion decisions. According to one study, about half of real-world 
evaluators look at one candidate at a time. The other half pools a 


number of applicants and screens various candidates simulta- 
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neously. Another survey, this one of senior executives in large US 
companies, suggests that about 30 percent of promotion decisions 
involve one candidate only. 

In our experiments, when evaluators looked at candidate 
profiles individually, men were more likely to be hired for the 
math task and women for the verbal task, including those who had 
performed below par. Our intervention, where evaluators were 
exposed to more than one candidate, was able to overcome these 
stereotypical assessments. Comparative evaluation focused eval- 
uators’ attention on individual performance instead of group 
stereotypes. When candidates were evaluated comparatively, not 
only did the gender gap vanish completely, but basically all evalu- 
ators now chose the top performer.’ 

This makes comparative evaluation not just more fair, but also 
profit maximizing. The right thing turned out to be the smart 
thing, too. Without hesitation, I recommend comparative eval- 
uation procedures to all organizations I work with. It is an im- 
mediately available design that promotes gender equality and an 
improved bottom line. At the Kennedy School, whenever possible, 
we bundle our junior faculty searches. This allows us to not only 
evaluate the candidates comparatively, but evaluate searches com- 
paratively, making performance criteria explicit and allowing us 
to calibrate our judgments across searches. By doing so, we also 
benefit from another behavioral insight, namely that variety is 
more likely to emerge when people make multiple decisions 
simultaneously rather than sequentially. In one experiment, re- 
searchers had volunteers choose three snacks—apples, cookies, and 
bagels—to be consumed on three different days—Monday, 
Tuesday, and Wednesday. The treatment group chose the snacks 


simultaneously. For example, on Monday morning they had to 
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decide what snack they would eat that afternoon and each after- 
noon for the next two days. The people in the control group chose 
sequentially: they chose one snack each day. About two-thirds of 
the people who made choices simultaneously selected three 
different snacks, but only 9 percent of the people who made se- 
quential choices selected three different snacks. 

Such variety-seeking with simultaneous choice has been doc- 
umented repeatedly. For example, people have also been found to 
buy a greater variety of yogurt when buying more yogurt at the 
same time. In many instances, diversification is a good thing. Most 
investment advisors will urge you to diversify your portfolio. Most 
nutritionists will recommend a varied diet. There are also obvious 
risks: the person who buys twelve flavors of yogurt may discover 
they dislike eight of them. Depending on your goals, package deals 
may or may not be optimal.* 

We should be worried, however, when employers keep se- 
lecting employees of the same kind, namely employees who look 
like them. Or as Carlos, a Hispanic attorney, told the sociologist 
Lauren Rivera when she asked what evaluators looked for in a job 
candidate: “You .. . use yourself to measure [fit] because that’s all 
you have to go on.” Analyzing hiring in investment banks, law 
firms, and management consulting firms, Rivera found that cul- 
tural fit, the degree to which a candidate’s backgrounds, hobbies, 
and self-presentation were similar to a company’s existing em- 
ployee base, was decisive in candidate evaluations across all three 
sectors. More than half of the evaluators interviewed described fit 
as the most important consideration when interviewing job can- 
didates, more important than, for example, analytical thinking or 
communication skills. The preference for fit was most pronounced 


in law firms and least in consulting firms. 
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In some cases, unconscious bias translates into a deliberate 
strategy. Denise, a white attorney, reported: “I really do think it’s 
about finding . . . something in common with your interviewee.” 
To top it off, a banker named Arielle described one of her best 
interviews ever: “She and I both ran the New York marathon... 
we talked about that and hit it off . . . we started talking about how 
we both love stalking celebrities in New York . . . we had this in- 
stant connection . . . I loved her.” 

Whether due to the chemistry that might develop between an 
evaluator and a job candidate or unintentional associations made 
during a job interview, we cannot help but be influenced by ir- 
relevant details—a shared joy of celebrity stalking, the color of an 
application letter, or a person’s appearance. Maybe an applicant’s 
jacket is your favorite shade of blue. While this fact is unlikely to 
be relevant for the job he or she is applying for, after having seen 
a color you like, you will be more favorably disposed toward that 
applicant. This correlation, the halo effect introduced in Chapter 2, 
has been demonstrated again and again and belongs to the category 
of confirmation biases where first impressions affect how we assess 
subsequent information. 

I find confirmation bias one of the most challenging obstacles to 
smart decision making. When people search for and assess infor- 
mation, they tend to favor evidence that confirms their existing 
beliefs. But it keeps them from assessing new information objec- 
tively. Indeed, it impedes their ability to learn. The consequences 
can be life threatening. Consider the following simulation con- 
ducted in Toronto: Medical doctors were given information on 
sixty-four patients having a heart attack. The doctors had to 
imagine being in the emergency room and having to make a ten- 


second decision based on a six-point history for each patient. They 
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had two (fictional) drugs available. It was clear that they had to 
learn by doing, updating their beliefs about which drug worked 
best for which patient based on the feedback they received. 
After administering one of the two drugs, they were informed 
of whether it had been a success or not. 

About a quarter of the doctors were able to work out the ap- 
propriate treatment for the patients: one of the drugs worked better 
for those with diabetes, the other for those without. What differ- 
entiated the minority of learners from the majority was visible in 
their brains: based on fMRI scans, the frontal lobes of the learners’ 
brains were particularly active whenever a treatment failed. They 
seemed to “learn from their mistakes.” In contrast, the laggards’ 
frontal lobes showed heightened activity when the treatment 
worked. The researchers conclude that a process of disconfirma- 
tion, ruling out alternative hypotheses, is required to learn. If you 
fall into the majority and seek out confirming evidence, you are 
more likely to be fooled by wishful thinking and random luck.° 

Learn from your mistakes! Don’t celebrate successes (at least, 
not when you need to learn). Unfortunately, we know that simple 
admonishments do not work, at least not very well and in many 
cases not at all. Our desire for internal coherence makes us more 
likely to continue on the path we are on, our existing biases in- 
fluencing, perhaps even dictating our choices. Consider the attrac- 
tiveness halo effect. Given that in many situations we see people first 
and speak with them second, this halo effect has been studied in 
detail. A wealth of research suggests that attractive individuals are 
not only assumed to be more honest and responsible, they are also 
perceived as more intelligent. The beauty premium yields real re- 
turns in the labor market. Controlling for everything else, more 


attractive people earn more than less attractive ones. 
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Let’s unpack this phenomenon a bit more. To better understand 
what people see in attractive counterparts, the economists James 
Andreoni and Ragan Petrie designed an experiment to measure 
people’s willingness to cooperate with others. They found that 
people were more likely to assume (incorrectly) their counterparts 
would contribute to the public good when they were more attrac- 
tive. It turns out attractive individuals are no more and no less 
cooperative than less attractive ones. When all participants learned 
about everyone’s level of contribution, Andreoni and Petrie discov- 
ered that the beauty premium turned into a penalty. Given people’s 
higher expectations of attractive individuals, their “only average” 
level of cooperation generated disappointment and an unwilling- 
ness to cooperate with them. 

Above-average expectations of physically attractive people 
have been reported in many different contexts. In other research, 
attractive study participants were expected to be better at solving 
mazes, leading them to be offered higher wages. The above- 
average expectations were supported by the fact that attractive 
people were more self-confident and socially skilled. But it all 
proved unwarranted: beautiful people, it turns out, do not solve 
more mazes. 

Adopting what worked for orchestras, namely having applicants 
audition behind a curtain, is not going to work in any number of 
situations, but at a minimum, countries, including much of Eu- 
rope and Israel, that still encourage applicants to include headshots 
with their resumes are well advised to stop. To explore whether 
employers paid attention to these photographs, researchers sent 
more than 5,000 CVs in pairs to about 2,600 advertised job open- 
ings in Israel. Resumes with no picture attached were paired with 


resumes to which a photo of either an attractive man or woman, 
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or an average-looking man or woman was attached. It turns out 
that employers were significantly more likely to call back attrac- 
tive men than either average-looking men or men submitting re- 
sumes with no picture. Now pause: based on all you have read and 
your own intuition, how do you think employers responded to 
female resumes? 

Somewhat surprisingly, attractive women did not reap a beauty 
premium. Employers demonstrated a clear preference for female 
CVs with no picture attached. One of the world’s foremost ex- 
perts on the economic consequences of the beauty bias, the econ- 
omist Daniel Hamermesh of the University of Texas at Austin, tells 
us that we should not have been quite so surprised. In the Western 
countries studied to date, looks seem to matter more for men’s 
labor market outcomes. In the United States, for example, better- 
looking men earn up to 5 percent more and worse-looking men 
up to 13 percent less than average-looking men, controlling for 
their education and experience. The effect is smaller for American 
women, though their earning potential is also improved or hurt 
by their looks. 

Discrimination based on appearance is widespread, although 
the specific gender dynamics vary across countries. In the United 
Kingdom, the appearance penalty is also larger for men than 
for women, but the beauty premium does not differ across the 
sexes. In contrast, in China, women are more strongly affected. 
Below-average-looking women earn 31 percent less and above- 
average-looking women 10 percent more than average-looking 
women. Below-average-looking men earn 25 percent less and 
their above-average-looking counterparts 3 percent more. Height 
is also a factor. Taller people earn more and are more likely to get 


hired and be promoted up the career ladder. 
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Favoring the good looking strikes many of us as the epitome 
of shallowness, but attractiveness arguably influences the course 
of global events. William McKinley, the twenty-fifth president of 
the United States, assassinated in 1901, was the last president to 
be shorter than the average American man. And it is not just in 
the United States that attractive candidates are more likely to be 
elected. Knowing nothing about the candidates other than their 
looks, research participants (often foreigners) were able to accu- 
rately predict election outcomes in Finland, France, Germany, 
Sweden, Switzerland, and the United Kingdom based only on 
candidate photographs. 

Clearly, looks help predict election outcomes—but do they also 
predict performance on the job? Although the debate is ongoing, 
the evidence presented in Hamermesh’s fascinating book Beauty 
Pays does not suggest that beauty is a credible marker for under- 
lying characteristics such as intelligence. When beauty pays, it ap- 
pears to be mostly based on stereotypes. There may well be cases 
where customers prefer interacting with better-looking salespeople, 
translating into real economic returns for the company—but not 
because the attractive salespeople are more savvy. Rather, customers 
discriminate against worse-looking employees.’ 

If you want to hire talent and find the best match of skills for 
the task at hand, then jumping to conclusions based on first im- 
pressions (and, worse, a photograph appended to a resume) is not 
a smart strategy. Fortunately, we can design our way out of this 
thicket of biases. Simply acknowledging that you will be influenced 
by first impressions is a step in the right direction. Companies I am 
working with, learning from orchestras who hold blind auditions, 
now experiment with “electronic curtains.” This entails not only 


the obvious—removing headshots from resumes—but removing 
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all other demographic information, including names, from elec- 
tronic job applications before reviewers see them. 

Note the word “experiment” in the paragraph above. Even 
with something seemingly as straightforward as blind evaluations, 
much remains to be learned. In 2009, the French government 
launched an interesting experiment that would affect all firms that 
made use of the services of the public employment agency, Pôle 
Emploi (which until 2005 had the legal monopoly on matching job 
seekers with firms). Pole Emploi invited firms to voluntarily par- 
ticipate in a blind recruitment process where the applicant’s name, 
address, nationality, and picture would be removed. Depending 
on a given firm’s preferences, Pôle Emploi employees thus either 
took the identifiers off applications or not in its standard pre- 
screening process before sharing the resumes with the employers. 

Three economists, Luc Behaghel, Bruno Crépon, and Thomas 
Le Barbanchon, analyzed the impact of blind evaluations on the 
likelihood that members of traditionally disadvantaged groups— 
immigrants, children of immigrants, or residents of deprived 
neighborhoods—would be invited to an interview and eventually 
hired. Based on a sample of 600 firms, they found a surprising 
result: anonymization reduced the chance that a member of a dis- 
advantaged group received an interview and eventually was hired. 
It turns out that a selection effect contributed to this unexpected 
and unintended outcome: the 62 percent of all of the invited firms 
voluntarily choosing to participate in the program were those that 
previously had hired many people of disadvantaged backgrounds, 
or as the authors state: “Anonymization then prevented selected 
firms from treating minority candidates more favorably during the 
experiment.” Experimenting and evaluating what works, therefore, 


remains as important as ever. While other studies have found that 
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blind evaluations indeed leveled the playing field, selection effects 
can lead to unintended consequences.® 

But you can do more than anonymize applications. You could 
belong to the first generation of managers to overcome what a 
2008 article on employee selection referred to as “the greatest 
failure of I-O [industrial and organizational] psychology”: the 
inability to get employers to rely on “decision aids,” including 
tests, structured interviews, and a combination of mechanical 
predictors “that substantially reduce error in the prediction of 
employee performance.” When surveying human resource man- 
agers, time and again unstructured interviews receive the highest 
ratings for perceived effectiveness—higher than, for example, ap- 
titude tests, personality tests, or general mental ability tests. How- 
ever, when compared to such tests, unstructured interviews fare 
worse. The resistance to analytical approaches seems to be driven 
by the two factors discussed in the previous chapter: people’s over- 
confidence in their own expertise and experience, and the dislike 
of probabilistic forecasts. And for precisely those reasons we are 
reluctant to believe, and act on, the evidence. 

But know this: the data showing that unstructured interviews 
do not work is overwhelming. A few years ago, a rather unique 
opportunity to measure the value an interview adds to perfor- 
mance predictions presented itself in Texas. Realizing that Texas 
did not have enough physicians, the state legislature required the 
University of Texas Medical School at Houston to increase the 
class size of entering students from 150 to 200 after the admis- 
sions committee had already chosen its preferred 150 students from 
a pool of 2,200 applicants. Thus, in mid-May, the committee had 
to go back to the pool and select an additional 50 students from 


the previously rejected ones. As most students apply to several 
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medical schools simultaneously, by that time all the top-ranked 
candidates had already been spoken for. This meant that by May 
the pool of still-available students was made up of candidates that 
had been previously ranked between 700 and 800 by the com- 
mittee. To put this in the committee’s perspective, back when it 
chose its original 150, the lowest-ranked student accepted came 
in at 350. Now they were getting ready to add another 50 stu- 
dents, none starting higher than 700. Of the 50 students finally 
selected, only 7 had received another offer from a medical school. 

What at the time felt like an unfortunate government-dictated 
requirement causing much concern about the quality of these 
additional, low-ranked students turned into a very interesting 
field study allowing a group of researchers from the University of 
Texas to examine whether the initial ranking mattered for the 
students’ performance in medical school and one year of post- 
graduate training. All 200 students, both the original 150 and the 
50 late-admittees, were selected based on the following criteria: 
academic performance (GPA and MCAT), pre-professional ad- 
viser assessment, work experience, extracurricular activities—and 
an interview. Each of the 200 selected students had been inter- 
viewed by a member of the admissions committee and one other 
faculty member. The researchers report that “no attempt was 
made to standardize the interviews or to weigh the objective and 
subjective variables considered by the interviewers. Each inter- 
viewer submitted to the Admissions Committee a written assess- 
ment of the applicant.”” 

The shocking news, not so much for the informed readers of 
this book but for the evaluators, was that there was no difference 
in performance in medical school or afterward between the ini- 


tially accepted and the initially rejected students. Indeed, the 
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highest-ranked top 50 students did not outperform the initially 
rejected students ranked between 700 and 800. 

How was this possible? Digging deeper, the researchers found 
that academic and demographic variables accounted for only about 
one-quarter of the difference in rankings between the initially ac- 
cepted and the initially rejected students. For example, the av- 
erage GPA of the initially accepted students was 3.48 and of the 
initially rejected students, 3.40, a tiny, hardly meaningful differ- 
ence. Instead, about three-quarters of the difference was basically 
explained by whatever had happened in the interview. While 
the grades of the two groups were almost identical, the committee 
ratings substantially differed. Initial admissions were heavily influ- 
enced by the rating of the interviewers. The researchers concluded: 
“In summary, it appears that careful initial screening of medical 
school applications by a knowledgeable person who assesses the ac- 
ademic and demographic variables, the work experience and extra- 
curricular activities, and the evaluations of pre-professional advisers 
establishes a good likelihood for successful performance in medical 
school. The traditional interview process does not appear to en- 
hance the predictive value of such initial screening. Should initial 
screening be followed by a lottery among the viable applicants?” 

A review analyzing eighty-five years of research in personnel 
psychology and nineteen different selection methods concludes 
that unstructured interviews should not be your evaluation tool 
of choice. In contrast, structured interviews do much better, par- 
ticularly when paired with a formal assessment of intelligence or 
general cognitive ability (there are a number of commercially 
available tests). General mental ability has long been shown to be 
the most valid predictor of work performance when evaluating job 


candidates without previous experience in the job. But, even 
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among candidates with previous experience, when high levels of 
skill are required, it ranks among the top predictors. 

A large meta-analysis conducted for the US Department of 
Labor, covering 32,000 employees in more than 500 distinct 
jobs, for example, reports that the validity of general mental ability 
in predicting job performance was high for most employment 
opportunities in the US economy, namely jobs of mid-level com- 
plexity, which account for 62 percent of all available jobs. General 
mental ability did better for the most complex jobs and worse for 
the least complex jobs that did not require any particular skills. 
To calibrate, reference checks and years of job experience did 
substantially worse than general mental ability. The overall pre- 
dictive power of mental ability was maximized when this mea- 
sure was combined either with a work sample test (very directly 
measuring the skill required to perform the job), integrity tests, 
or—and this is good news for all attached to interviews—with a 
structured interview. 

In the extreme case, unstructured interviews make things 
worse. One of the leading researchers who explored their con- 
sequences, Robyn Dawes of Carnegie Mellon University, and 
colleagues show in a series of experiments that it is almost impos- 
sible for evaluators to ignore nondiagnostic information—even 
though it is well known that such information reduces an evalua- 
tor’s reliance on more valuable information. To make matters 
worse, evaluators were very good at “sensemaking” after the fact. 
In one study, people even saw patterns in random sequences." 

Investing in structured interviews pays. What’s more, it is easy 
and cheap. Here is advice I urge everyone to embrace: plan ahead, 
use a checklist to structure interviews and stick to it, and evaluate 


candidates in real time. If candidates are to be interviewed by sev- 
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eral colleagues, do not compare notes until the very end. Finally, 
measure and experiment. 

Before the first job interview, determine what you are looking 
for in a candidate. Draw up your list of questions beforehand. Any 
structured interview format will diminish subjectivity in evalua- 
tions, as the research by Barbara Reskin and Debra McBrier shows, 
but what feels more like an art can turn into science if you learn 
to rely on people analytics telling you which questions typically 
yield high correlations with the attributes you care about. Those 
insights come from deliberate experimentation. Create a scoring 
system on a scale from 1 to 10 (though feel free to adapt the scale 
to your needs) for each interview question, and think about how 
much weight you want to assign each question. Maybe you want 
to treat them all equally, or maybe some of your questions have 
been proven to better capture what you are trying to measure. If 
so, assign more weight to them.’ 

Be vigilant about the halo effect. Score each of the attributes 
you measure before moving on to the next one and ask all candi- 
dates the same questions in the same order. This is difficult, for 
invariably the interviews will follow different paths, and what 
seemed a logical follow-up question for the first candidate may 
seem awkward with the next. Don’t give in. It is okay to say that 
you will revisit a topic later on, but greater benefits come from 
sticking with the structured interview. In The Checklist Manifesto, 
Atul Gawande reminds us that adherence to protocol is not ri- 
gidity, but rather frees up mental capacity to deal with the hard 
issues. What Gawande finds works in medicine, finance, and all 
high-stakes decision making works as well when interviewing.’ 

It is very important to assign scores right away. Our memory 
plays too many tricks on us. According to the Innocence Project, 
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three-quarters of wrongful convictions later exonerated through 
DNA testing resulted from unintentional lapses in eyewitness 
memory. Like all of us, eyewitnesses are influenced by the many 
biases and heuristics affecting how we assess information. And so 
are job interviewers." 

Consider that we are more likely to remember vivid examples 
than boring numbers. What is more worrisome is that we tend to 
take the ease with which an event comes to mind as an indication 
for its likelihood. When asked to guess the more likely cause of 
death in the United States, murder or suicide, most people guess 
murder. Wrong. Suicide is more prevalent. This is a classic example 
of the availability heuristic, so named by Daniel Kahneman and 
Amos Tversky. Murders tend to receive more media attention than 
suicides; consequently, murder is the more available answer. 

Evaluators misremembering is a real danger in interview-based 
quality assessments. According to the peak-end rule and recency bias, 
people make judgments based on the most intense and the most 
recent experiences rather than the total sum of the experiences or 
some average thereof. In addition, they may have been affected 
by framing during the interview process. It turns out that even 
just introducing random numbers, whether 100 or 1,000, will af- 
fect your best guess of, say, how many bridges there are in Venice. 
There are somewhere between 400 and 450. However, when I 
asked half of my students whether there were more or fewer than 
100 bridges, and the other half whether there were more or fewer 
than 1,000 bridges, the first group, adjusting up from 100, guessed 
a substantially lower number than the second group, which ad- 
justed down from 1,000. The anchoring effect, another heuristic 
coined by Kahneman and Tversky, has even been shown to affect 


how much people are willing to pay for certain goods." 
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To guard against evaluation biases, frames, and anchors, eval- 
uate candidates comparatively. Once you have interviewed all 
candidates, compare their responses horizontally across candidates, 
question by question. This is a procedure many academics employ 
when grading exams. We have already learned to identify exams 
by student ID numbers rather than by names, but then I do not 
evaluate a student’s exam by reading his or her answers to all ques- 
tions. Rather, I grade all students’ answers to Question 1, then 
their answers to Question 2, and so on. Ideally, I hide my assess- 
ment of Question 1 from myself to make sure I evaluate the next 
question uninfluenced by first impressions. You will likely expe- 
rience some discomfort—I know I have—as you discover that a 
student can give superb answers to the first two questions but 
deeply disappoint in the answer to your third and fourth questions. 
But acknowledging this internal inconsistency is worthwhile. 

Likewise, avoid the consistency commonly known as “group- 
think.” The idea that groups might perform worse than individ- 
uals because they tend toward uniformity and censorship was first 
introduced by Irving Janis. Since then, much research has been 
conducted on the phenomenon, and collectively it paints a some- 
what more complex picture, recently discussed in the 2015 book 
Wiser by Cass Sunstein and Reid Hastie. Most important for our 
purposes, groups have been found to be even more prone to rely 
on the representativeness heuristic and show more pronounced 
overconfidence in their judgments than individuals do." 

If you currently use panel interviews, where a group of inter- 
viewers meets with a job candidate all at the same time, stop. 
Instead, the ideal is independent, uncorrelated assessments, unin- 
fluenced by what the other interviewer thinks. We all have been 


in meetings where it was obvious that the group did not reach 
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the best outcome but rather followed the loudest voice in the 
room. By now we know too much about unconscious biases to 
accept such outcomes without complaint or concern. To benefit 
from various sources of expertise, we should try to keep interviewers 
as independent of each other as possible. To state the obvious, if 
you have four interviewers, four data points from four individual 
interviews trump one data point from one collective interview. 

Slightly higher-altitude fruit: you might want to invite the 
evaluators to submit their assessments before a meeting to dis- 
cuss an applicant. This would allow an organization to aggregate 
answers—those one-to-ten weighted scores to questions asked 
in the exact same order—perhaps even automatically. Assess- 
ments with candidates above a certain threshold—both in terms 
of average scores as well as in terms of minimal number of eval- 
uations with a certain score—would advance for consideration, 
whether this entails hiring or promotion. Go one step further 
and have a computer program evaluate these suggestions for 
shared biases. 

Armed with individual evaluations, the outcome of the ag- 
gregation process, and the bias analysis, then have the evaluators 
meet and discuss controversial cases. For particularly important ap- 
pointments or promotions, you might want to consider a slightly 
more elaborate design: employ the previous process, but have it 
conclude with two meetings. In the first meeting, the committee 
members are randomly assigned to smaller subcommittees to dis- 
cuss the candidates, and at the second meeting, the full committee 
discusses the recommendations of the subcommittees. It is likely 
impossible to avoid groupthink in the final stage of the process, 
but this approach will allow for more than one groupthink to 


evolve in the subcommittees before the full committee meets. 
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Generally, research suggests that groups will reach better decisions 
if they adopt more structured processes, including having sub- 
groups or an assigned devil’s advocate.” 

This starts to sound like complicated, hard work. It is. But 
technology can help. A few tools are now on the market, assisting 
employers in evaluating talent more objectively. The Behavioural 
Insights Team in the United Kingdom has developed Applied, a 
tool building on much of the evidence discussed in this chapter. 
Applications are blinded and assigned to different evaluators, work 
sample tests that assess applicants on real tasks required for the job 
emphasized, and independent assessments centrally aggregated. 
Similarly, GapJumpers and Unitive have also created recruitment 
platforms that anonymize the gender and race of the applicant. 
Unitive also exposes hiring managers to one piece of information 
at a time—separating where applicants went to school, for ex- 
ample, from their previous employers—to make sure an evalua- 
tor’s rating is not unduly influenced by false inferences.'® 

If you still think that this is too labor-intensive and time- 
consuming, consider the high cost of bad appointments, and our 
inability to deal with them well. Most of us are enthralled by the 
current state of affairs and experience change as a loss. This means 
bad appointments are mistakes that are not easily fixed. In a survey 
of corporate directors about their company’s talent management 
practices, Boris Groysberg and Deborah Bell found that fewer than 
10 percent of directors thought that their companies did a good 
job dealing with such mistakes. The harder work is arguably let- 
ting go a poor hire." 

Don't let that happen to you. Pay more attention to whom you 
select. But don’t just take my word that adopting these design fea- 


tures for recruitment and promotion are a benefit. Measure and 


Interview 


ecklist 


Prepare <20 ar sete som oe E ee rem E] 


1. Determine number of interviewers and their demographics 
(use your own data!). 


2. Determine questions (use your own data). 


During 


3. Interview separately (no group interviews). 


4. Ask questions in the same order and stick to it. 


5. Be aware of framing effects: anchoring, representativeness, 
availability, halo... 


6. Score answers to each question and score immediately 
afterwards. 


7. Compare answers to questions across candidates. One 
question at a time. 


8. Use pre-assigned weights for each question to calculate 
total score. 


9. Submit your scores to the lead evaluator. 


10. Meet as a group to discuss controversial cases. Consider 
sub-groups for important hires. 


A checklist for comparative, structured interviews 
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experiment. And let me help you by offering a checklist for your 
next interviews. 


Designing Gender Equality—Create Smarter 
Evaluation Procedures 


e Evaluate comparatively and hire or promote in batches. 
e Remove demographic information from job applications. 
e Use predictive tests and structured interviews to evaluate 


candidates. Do not use unstructured interviews. 


Attracting the Right People 


If you manufacture diet sodas, you face a challenge: men, or a bit 
less than half of the potential worldwide market, do not seem to 
like drinking your products. At least, they drink them much less 
often than women do. Your challenge boils down to answering 
why this is so. Maybe men start off less concerned about how many 
calories their sodas contain. Maybe they just drink fewer sodas. 
Or, maybe, there is just something about diet soda that turns them 
off. Confronting this same challenge, several companies concluded 
that the latter was likely the case and further suspected that there 
was something about the label “diet” that did not agree with men. 
Put differently, they suspected that “diet” just didn’t align with 
male gender identity. 

They started to experiment with different messages. For ex- 
ample, Coca-Cola introduced “Coke Zero”; PepsiCo, “Pepsi 
Max”; and the Dr Pepper Snapple Group came out with “Dr 
Pepper Ten,” a ten-calorie soda whose slogan announced that “It’s 
Not for Women.” When Coke Zero was introduced to the United 
Kingdom, it was referred to as “bloke Coke,” and the head of 
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Coke’s integrated marketing communications in North Amer- 
ica went as far as to comment at the time of the launch: “We're 
positioning Coke Zero as a defender and celebrator of guy 
enjoyment.”! 

Not all marketing plays as overtly on gender stereotypes, but 
gendered messages are more prevalent than you might think. For 
example, Gillette expanded the appeal of its brand, traditionally 
associated with personal care products for men, to include women 
by designing women’s razors in pastel colors and naming them 
“Venus Divine,” “Daisy,” or simply “Gillette for Women.” Gil- 
lette’s dark-colored razors for men instead carried names like 
“Mach3 Turbo” or “M3 Power Nitro.” And while yoga is most 
often marketed to women, I recently learned that “Broga” is spe- 
cifically for men.” 

What messages do you send in your job advertisements, news- 
letters, web pages, blogs, and other communications? Are your 
messages framed to appeal equally to all, or are they more likely 
to speak to some but not to others? Clearly, for any organization, 
attracting the right people, whether employees or clients or cus- 
tomers, is paramount—and it starts with smart advertising. One 
takeaway: the right language matters. 

Linguists have long pointed out that language is gendered. In 
English, pronouns can present a problem. You need to hire a new 
marketer and you hope he is skilled. Or should it be she? Using 
“he or she” in such sentences has only recently become accepted. 
In other languages, even nouns can be gendered. In German, for 
example, a male professor is a “Professor” and a female professor, 
a “Professorin.” But often, professors are only referred to with the 
male noun. In fact, there is a lovely German word for this, “mitge- 


meint,” which is hard to translate but means that women are also 
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included when a male noun is used. But, of course, this is not what 
the evidence tells us. 

It is well known that women are less likely to apply for jobs 
that are male-labeled, and men are less likely to apply for jobs 
that are female-labeled. For decades in the United States, it was 
common to place advertisements in sex-segregated newspaper 
columns—jobs for women and jobs for men—until civil rights leg- 
islation made the practice unconstitutional in 1964. Jobs used to 
be advertised to attract male candidates for male-dominated pro- 
fessions, such as linemen being sought by a power company, or to 
attract female candidates for jobs traditionally occupied by women, 
such as stewardesses being sought by airlines. By the early 1970s, 
the practice had completely vanished—and it had real conse- 
quences. When ads do not make reference to the desired sex of 
the ideal candidate, research shows that both men and women are 
substantially more likely to seek nontraditional jobs. Placing ads 
in integrated columns organized alphabetically rather than segre- 
gated by sex proved to be a particularly powerful solution—much 
more effective than disclaimers saying that “job seekers should as- 
sume that the advertiser will consider applicants of either sex in 
compliance with the laws against discrimination.” Once a job was 
assigned to a sex-identified column, disclaimers helped very little.* 

Job ads targeting a particular sex are still common in some parts 
of the world. For example, an examination of more than 1 mil- 
lion job ads aimed at highly educated urban job seekers posted 
during 2008 and 2009 on Zhaopin.com, one of the leading on- 
line job boards in China, revealed that about one-third of the more 
than 70,000 advertising firms used job descriptions identifying the 
sex of the ideal employee in one or more of their ads. The gender- 


specific ads, which favored women and men in about equal mea- 
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sure, were quite explicit as to what they were looking for. When 
they targeted women, the ads encouraged young, tall, and attrac- 
tive applicants; when they targeted men, they encouraged older 
applicants. 

The gender targeting followed an interesting pattern, however. 
Firms seemed most explicit about the preferred sex of their ideal 
candidates when the size of the labor pool meant that they could 
afford to do so. They relied on more objective measures of per- 
formance in tighter labor markets and for higher-skilled positions. 
One of the most robust findings of the study was a negative “skill- 
targeting relationship”: as a job required more skills, whether 
measured by educational background, experience, or even the pay 
oftered, fewer ads made reference to a particular sex. The analysis 
may remind you of our earlier discussion of Gary Becker’s argu- 
ment that a “taste for discrimination” should disappear as competi- 
tive pressures on firms increase. Left behind and most vulnerable 
to such discrimination are those who do not possess skills dif- 
ferentiating them from others and who live where the supply of 
labor is abundant.* 

Even in countries where job descriptions no longer make ex- 
plicit references to the sex of the ideal candidate, more subtle cues 
remain. A group of researchers set out to measure how gendered 
the wording in job advertisements remained in the twenty-first 
century and whether the use of words associated with stereotypical 
gender roles, such as competence for men and warmth for women, 
has an impact on how applicants perceive these jobs. The team 
content-coded job advertisements on Canada’s leading job search 
websites, Monster.ca and Workopolis.com. They focused on male- 
dominated and female-dominated occupations. These included 


plumbers, electricians, mechanics, engineers, security guards, and 
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computer programmers, where men occupied between 99 percent 
and 74 percent of the jobs, and administrative assistants, early 
childhood educators, registered nurses, bookkeepers, and human 
resources professionals, where women occupied between 97 percent 
and 71 percent of the jobs. 

To measure whether advertisers had selected masculine wording 
for male-dominated occupations and feminine wording for female- 
dominated occupations, they relied on published lists of agentic 
and communal words, such as individualistic, competitive, ambi- 
tious, assertive, and leader in opposition to committed, supportive, 
compassionate, interpersonal, and understanding. Each ad then 
received a score based on the fraction of male- or female-gendered 
words it contained. Unsurprisingly, the ads were heavily gen- 
dered. Indeed, the fraction of gender-stereotypic words was even 
correlated with the proportion of men and women in a given 
profession. This means that more male-stereotypic words were 
used for plumbers than for computer programmers and more 
female-stereotypic words were used for administrative assistants 
than human resources professionals. 

Such wording matters. In follow-up experiments examining the 
impact of masculine wording, the authors found that people in- 
ferred from the ads how male- or female-dominated the profession 
was. The more women inferred a profession to be male-dominated, 
the less they found these jobs appealing. Tellingly, it was not a 
matter of perceived competence to succeed at the job. Gendered 
wording told the applicants something about whether or not they 
“belonged,” but did not affect whether or not they thought they had 
the skills to do the job.’ 

People self-select into jobs based on their preferences and their 
beliefs about whether or not they belong. Job descriptions provide 
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information and behavioral cues about both. And if you are not 
careful, both speak immediately to gender identity fit. Economists 
refer to such behavior as sorting. People sort into neighborhoods, 
schools, jobs, clubs, and social groups. Some of them charge high 
“membership” rates and offer specific privileges to attract those 
who value these privileges enough and have the capacity to pay 
for them, thereby deterring others. Among members, linguistic 
cues are like price tags. They signal to some that they belong and 
are welcome but deter others who, consciously or unconsciously, 
realize that they would have to pay too high a price to fit in. 

Sorting is not a bad thing—in fact, in many circumstances we 
might want to encourage people to self-select. They have much 
more information about themselves than any outside observer or 
evaluator ever will. But we should be aware of what the basis of 
their sorting is. This is the moment for design. You need to scru- 
tinize your messages for the signals they send to the world. Maybe 
elementary schools want to add still more women to their roughly 
80 to 90 percent female faculty by specifying in their ads that “they 
look for a committed teacher with exceptional pedagogical and 
interpersonal skills to work in a supportive, collaborative work 
environment.” But I doubt it. Most schools want to benefit from 
100 percent of the talent pool and not deter skilled male appli- 
cants simply because the gendered adjectives in their advertise- 
ments signal to men that they do not belong. And schools are 
also keenly aware that boys need same-sex role models. This is 
an easy fix. Schools, and any other organization for that matter, 
can use inclusive language in their ads. With a few simple word- 
choice changes—‘they look for an excellent teacher with ex- 
ceptional pedagogical skills’—you have expanded the potential 
talent pool. 
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In many ways, by de-biasing job advertisements, we are helping 
the market do its job. It should bring those who offer services 
together with those who demand them. And it should do so effi- 
ciently, aiming for the best possible matches. Elaborate match- 
making algorithms now help schools find the right students and 
hospitals identify the right residents, and vice versa. For example, 
in 2003, New York City adopted a new method for matching 
students with high schools. The system is based on a “deferred ac- 
ceptance algorithm” designed by a team of academics, including 
my former colleague and 2012 Nobel Laureate in Economics, 
Alvin Roth. The algorithm matches the best available school with 
the best available students, based both on the student’s school rank- 
ings and the school’s student rankings. The rankings are based on 
the signals the students and the schools send about what they have 
to offer and what they are looking for. Of course, the algorithm 
can be designed to be blind to demographic characteristics. In 
his illuminating 2015 book Who Gets What—and Why: The New 
Economics of Matchmaking and Market Design, Roth provides many 
more examples of successful matchmaking, including between 
people in need of an organ and those willing to donate. The beauty 
of an algorithm for finding a compatible organ donor is that it 
strives to include all medically relevant information—and nothing 
more. We should apply the same high standards when matching 
employees with employers.° 

While many of us charged with hiring might not engage in a 
formal matchmaking process, we can improve our search proce- 
dures. In addition, we should be wary of the language used not 
only in our own communications but also in the ones we receive. 
Letters of recommendation, for example, are another place where 


gender-biased language often creeps in. According to one study, 
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letters written for female medical faculty tended to be shorter, 
more likely to raise doubt (faint praise; hedges; negative, un- 
explained comments) and refer to her teaching instead of her 
research, reinforcing stereotypes portraying women as teachers 
and men as researchers. For an example of gendered language, 
consider this: “On a personal level Sarah is, in my opinion, the 
quintessence of the contemporary lady physician who very ably 
combines dedication, intelligence, idealism, compassion, and re- 
sponsibility without compromise.” Doubt raisers included lan- 
guage such as this: “It appears that her health and personal life 
are stable,” or “While not the best student I have had,” or “As an 
independent worker she requires only a minimum amount of 
supervision,” or “She worked hard on projects that she accepted.”’ 
Eliminating gendered language in letters of recommendation and 
job ads is low-hanging fruit. But we do not always know a priori 
what wording works best to attract the right kinds of employees. 
Sometimes, it pays to test the impact of your messages, even as you 
are sending them. 

Testing is exactly what the Government of Zambia did when 
it was looking to fill the newly created position of community 
health worker, or CHW. It collaborated with researchers to better 
understand how it could attract the right people through messages 
in job advertisements. In a field experiment, the researchers learned 
that while the job required people who wanted to do good and 
care for others, putting such information in the job advertisement 
attracted the wrong kinds of people. “Want to serve your com- 


1? 


munity? Become a CHW!” attracted less qualified applicants than 
ads that made the extrinsic benefits of the job salient: “Become a 
CHW to gain skills and boost your career!” Such career incentives 


led more qualified people to apply who were then more effective 
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at delivering health services. Interestingly, they also proved no 
more likely to leave the job for better opportunities than the 
health workers who had responded to the service-oriented ad- 
vertisement. It turned out that both groups wanted to do good, 
but higher-ability workers also wanted to have a career. Neither 
the government nor the researchers would have known this 
without actually testing the impact of various messages.® 

Sometimes, even the gender effects of our messages cannot be 
known ahead of time without running an experiment. Consider 
gender differences in attitudes to risk and competition. If they exist 
(we will discuss this in more detail in Chapter 8), we might ex- 
pect men and women to respond differently to ads depending on 
how competitive a workplace they describe. That is what Jeffrey A. 
Flory, Andreas Leibbrandt, and John List sought to find out. They 
ran an advertisement in sixteen cities across the United States for 
a “News Assistant,” in which the primary responsibility was de- 
scribed as creating news digests by summarizing stories and writing 
short reports. The researchers examined whether women were in- 
deed more attracted to less competitive compensation schemes. 
Job seekers encountered ads that randomly described the position’s 
compensation as being fixed, partially dependent on job perfor- 
mance, or substantially dependent on job performance. Across all 
sixteen cities, almost 7,000 people applied—but far fewer women 
applied when the position’s compensation was advertised as ex- 
tremely competitive. While neither women nor men embraced 
competitive environments, women disliked them decidedly more 
than men.’ 

This problem seems general, even global. Laboratory exper- 
iments conducted in various countries all present similar evi- 


dence. Women tend to opt out of competitive and variable pay 
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schemes, typically due to their more pronounced aversion to 
risk, lower self-confidence, and dislike of competition. There 
seems to be at least one design that affects this gendered self- 
selection. Interestingly, the gender gap in competitive environ- 
ments reverses when people compete in teams. This appears to 
bea pull-push phenomenon, with teams attracting women even as 
they push certain men away. Working in groups appears to boost 
women’s self-confidence, while men grow concerned about their 
team members’ abilities. In particular, high-performing men shy 
away from competitive pay schemes when pay depends on team 
performance." 

As is often the case with gender inequality, women may well 
have highly practical reasons for avoiding competitive compensa- 
tion schemes, just as men have reasons for embracing them. An 
analysis of German labor data suggests an additional reason for why 
women self-select into piece-rate schemes: compensation tied more 
closely to objectively measured performance leaves less room for 
discrimination. In the sample of blue-collar workers analyzed, 
piece-rate schemes indeed worked better for women, with a smaller 
unexplained gender wage gap in the piece-rate regime than in the 
time-wage regime. History lends support to this observation. 
Claudia Goldin took the long view, studying where women have 
historically worked in the United States. She discovered that 
women tended to work in occupations that rely heavily on objec- 
tive performance measures.!! 

Goldin also showed that women self-select into occupa- 
tions that allow for more flexibility. In her 2014 presidential ad- 
dress to the American Economic Association, she identified the 
premium that women place on flexible work conditions as a key 


factor affecting gender segregation in the labor market. It also helps 
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explain gender differences in pay and promotion in occupations 
that make flexibility available to employees only at a high cost. 
These include the corporate, finance, and legal sectors, which dis- 
proportionately reward people who work long hours continuously, 
without taking time out for family care. In contrast, in science, 
technology, and health care, demands on people’s time have started 
to change. Goldin points out pharmacists as an interesting case in 
point. Pharmacology, where earnings have a more linear rela- 
tionship with hours worked than is true in business or law, is the 
third-highest-paying occupation for women and the eighth-highest 
for men. The more hours pharmacists work, the more money they 
make, irrespective of whether they work part time or full time, or 
take time out to care for their family. Correspondingly, the oc- 
cupation has one of the lowest gender pay gaps among high-paying 
jobs and over-proportionally attracts women." 

Sectors trying to attract more women use this knowledge in 
their job descriptions. And some go further and try to change the 
very dynamics that led to Goldin’s conclusions. In an interview 
with the Financial Times, Niall FitzGerald, then co-chair of Uni- 
lever, explained how this could work: when one of his colleagues 
said, “We must identify very clearly those jobs which can be op- 
erated in a flexible manner,” his response was, “You're going in 
absolutely the wrong direction. We will say: ‘In principle, every 
job can be operated in a flexible manner unless it can demonstrably 
be shown to be otherwise. ” 

There is a reason for the due process requirement in criminal 
law that stipulates a defendant is presumed to be innocent until 
proven guilty. Defaults are powerful, and requiring the govern- 
ment to prove the guilt of a defendant and not the other way 


around safeguards justice. As a default, “flexibility unless proven 
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not to work” would dramatically change the structure of work. If 
flexible work arrangements were the default until proven objec- 
tively untenable, or until employees decided that they wanted to 
opt out, many more people would embrace working flexibly. Much 
research by behavioral economists underscores that moving from 
“opt-in” to “opt-out” defaults can have huge impacts. The now- 
iconic example is how participation in employer-sponsored 401(k) 
plans increases dramatically when companies automatically enroll 
their employees compared to when employees have to take the ini- 
tiative. A well-established bias explains why: people tend to cling to 
the status quo, aptly named status-quo bias by William Samuels 
and Richard Zeckhauser in 1988. In particular, when people face 
complicated decisions, such as planning for retirement, they tend 
to avoid making one and procrastinate. In addition to defaulting 
people into a particular savings plan or flexible work arrangement 
from which they can opt out if they wish, simplifying the process 
and requiring people to make an active choice can also help them 
save more.!? 

The Australian company Telstra, mentioned earlier, is showing 
how this might work. It made flexibility the norm. “Flexibility 
applies to every kind of role at Telstra,” it now states on its web- 
site. Conversations I had with Telstra employees while I was in 
Australia in 2015 suggest that “All Roles Flex” in fact has become 
the default work arrangement. Before implementing the change 
across the board, Telstra ran a three-month pilot program in its 
customer sales and service business unit, which it described as “a 
new and disruptive position around mainstreaming flexibility 
that would amplify productivity benefits, lift engagement, estab- 
lish a clear market position, and also enable a new way of working.” 


During the pilot phase, it monitored who applied for posted job 
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openings and for what reason. As expected, the share of women in 
the applicant pool increased, and about one-third of the candidates 
reported that they had applied because of Telstra’s stance on flexi- 
bility. While not a randomized controlled trial, the Telstra ex- 
perience provides some hints at how flexibility might affect who 
companies are able to attract." 

Of course, if flexible work arrangements were no longer a 
womans issue, there would no longer be the gender-specific career 
penalties that Goldin writes about. With increased demand for 
flexibility, we can (and you should) anticipate that competitive 
labor markets will adjust to employees’ preferences and stop dis- 
criminating against people seeking flexibility. 

When seeking to hire, in addition to worrying about the con- 
tent of your message, you also need to consider the context in 
which your job advertisement is read. Framing does not end with 
the language you choose or the defaults you set but includes, as 
marketing experts would tell us, all aspects of design, including the 
placement of the ad and the proximity of other ads. Recall the 
lesson learned in Chapter 6: our minds want to evaluate everything 
comparatively. Today, technology even allows job seekers to take 
what other applicants are doing into account. 

Benefiting from the large amounts of data collected by the pro- 
fessional social networking website LinkedIn, Laura Gee of Tufts 
University was able to measure how job seekers respond to infor- 
mation on what other applicants are doing. She analyzed the job 
search behavior of almost two million job seekers from more than 
200 countries viewing 100,000 job postings in March 2012. About 
two-thirds of the job seekers were male, thirty-six years old on 
average, and almost half came from the United States. Most 


everyone had a bachelor’s or a post-bachelor’s degree. 


Attracting the Right People 159 


Job seekers looked at a wide variety of jobs and responded to 
postings from about 21,000 companies, with the two most preva- 
lent sectors being high tech and finance. Gee was interested in 
better understanding one particular aspect of the job search, namely 
how job seekers responded to information on what other appli- 
cants were doing. In line with the above findings, it could be that 
knowing others are also applying for a given job encourages women 
to shy away, disliking the competition that comes with popular 
jobs. Alternatively, perhaps knowing what others are doing pro- 
vides additional information on the potential attractiveness of the 
job. The more applicants are averse to ambiguity, the more they 
might appreciate this additional piece of information, making them 
more likely to apply. 

To investigate, Gee ran a field experiment where job seekers 
were randomly assigned to one of two conditions: they either saw 
the number of others who had started an application when they 
viewed the online posting or they did not. Interestingly, the ad- 
ditional information hardly matters for men, but it increases the 
likelihood that women apply by 10 percent. On any given day, this 
means employers seeking 100 percent talent could receive thou- 
sands of additional applications from women, including in indus- 
tries where the fraction of women is still small, such as high tech 
and finance. Being more averse to ambiguity and less confident, 
women were reassured that a job was worth applying for when 
they knew that others found that job appealing. Being less averse 
to ambiguity and more confident, men found the additional in- 
formation of little importance. Whereas women generally have 
been found to shy away from competitive environments, knowing 
about the desirability of a job seemed to outweigh whatever con- 


cerns they have about stiffer competition.’ 
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Attracting the right kinds of people to apply for a job is hard. 
Along with the various biases employers can anticipate and adjust 
for, there are associations and contextual factors—whose ad is next 
to yours, for one example—that are beyond your control. But as 
I have noted, and any human resources manager can attest, 
handling bad hires is difficult and sometimes costly. Zappos, the 
online shoe store, used an innovative design to make sure it ended 
up with the right kinds of employees. After a few weeks on the 
job, when new hires spend much of their time in training, the 
company offered their new employees an opportunity to quit— 
accompanied by a “golden handshake” consisting of one month 
of salary in addition to whatever they had already earned. Why 
would Zappos do such a thing? Because it allowed employees to 
“sort.” Those who liked the company and their job stayed, valuing 
the opportunity to be a Zappos employee more highly than the 
cash bonus offered to them for quitting. Others left without im- 
posing additional costs, either to the company or themselves.'° 

Attracting the right people instead of managing the wrong ones 
is one of the most important tasks any organization confronts. This 
is the mantra Google lives by—or, as Laszlo Bock writes: “Only 
hire people who are better than you.” In an interview on the com- 
pany’s hiring and corporate culture, Eric Schmidt, the executive 
chairman, explained that in addition to judging the technical qual- 
ifications of potential hires, a key focus at Google was to deter- 
mine whether they were passionate and committed to innovation. 
Surely, allowing all Google engineers to spend 20 percent of their 
time developing their own ideas serves as a sorting device. It at- 
tracts creative, independent minds who invent Google News, 
Orkut, or a social networking site. The time is not written in stone 


nor necessarily utilized, but it matters as an idea: “No one gets a 
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‘20 percent time’ packet at orientation, or is pushed into distracting 
themselves with a side project. Twenty percent time has always 
operated on a somewhat ad hoc basis, providing an outlet for 
the company’s brightest, most restless, and most persistent em- 
ployees—for people determined to see an idea through to com- 
pletion, come hell or high water.” " 

Not many of those “seeing an idea through to completion, 
come hell or high water” are women. In the spring of 2015, a 
gender discrimination trial brought by a former junior partner at 
a venture capital firm in Silicon Valley drew renewed attention to 
the low fraction of women in technology. While in the end a jury 
found against the plaintiff, the low numbers were undeniable: 
fewer than 20 percent in most tech companies and even fewer in 
Silicon Valley’s venture capital firms. Some argue that the “tech 
bros” mentality of Silicon Valley keeps women out and even dis- 
courages female students from focusing on computer science. Per- 
haps. Surely, the male-dominated environment does not help tech 
firms attract women. As we know, deviating from behavior that is 
expected of a social category, either by others or by oneself, can 
be costly. A woman who acts against the norms by definition 
doesn’t “belong”; not surprisingly, the fear of not belonging 1s 
influential." 

Indeed, research by Boris Groysberg, Ashish Nanda, and Nitin 
Nohria (now dean of Harvard Business School) suggests estab- 
lishing belonging turns out to be a major concern of female job 
seekers. They report that women consider more factors than men 
when screening jobs; in particular, cultural fit, values, and mana- 
gerial style. There is a surprising silver lining to this research, how- 
ever: it carries hidden benefits for women and their employers. In 


follow-up work, Groysberg identifies this scrutiny as one of the 
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key variables explaining why women transition more successfully 
to new companies than men. Women know better what they are 
getting themselves into. 

The researchers analyzed the performance of more than a 
thousand “star” analysts working for almost eighty different in- 
vestment banks over a nine-year period. Analysts were labeled 
“stars” if they were ranked as one of the best in the industry by 
Institutional Investor magazine. The team was interested in better 
understanding whether the analysts’ skills were portable when they 
switched companies. It turns out most analysts lost their stardom 
when they changed employers unless they moved to a better firm 
or brought their whole team along—with the exception of female 
analysts. Not only had the women studied a potential new em- 
ployer more carefully before joining, they had also built their ex- 
pertise differently than their male colleagues. The top-performing 
female analysts had “built their franchises on portable, external 
relationships with clients and the companies they covered, rather 
than on relationships within their firms.” Or as one female star 
analyst put it: “For a woman in any business, it’s easier to focus 
outward, where you can define and deliver the services required 
to succeed, than to navigate the internal affiliations and power 
structure within a male-dominant firm.””” 

People choose organizations based on their preferences and 
their beliefs about whether or not they could thrive in a given 
organization. Messages shape those beliefs. Consider the messages 
sent when Lieutenant General David Morrison stated in a video 
posted on the Australian army’s official YouTube channel that he 
was committed to inclusion. “If that does not suit you, then get 
out,” Morrison flatly declared. “There is no place for you amongst 


this band of brothers and sisters.” Acting in response to a 2013 in- 
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vestigation into sexual abuse, Morrison sent a strong message. In 
2014, Morrison joined the Australian delegation to the Global 
Summit to End Sexual Violence in Conflict in London. Speaking 
again with admirable bluntness, he said that armies that assign 
more value to men than to women and tolerate sexual violence 
“do nothing to distinguish the soldier from the brute.””° 

Will these messages attract and retain soldiers valuing equality 
and inclusion? Time will tell. And while actions have followed his 
words, we all know that talk can be cheap. When and how mes- 
sages affect behavior is a large field of inquiry in itself, but experi- 
mental evidence is rare. One example, however, is encouraging. 

Robert Jensen and Emily Oster took advantage of the fact that 
cable television became available at different times in different parts 
of India, allowing them to trace whether attitudes and behaviors 
went along with exposure to the new information cable program- 
ming provided. They found that the introduction of cable televi- 
sion was associated with improvements in women’s status in rural 
areas, including female school enrollment, decreases in fertility, 
as well as reported increases in autonomy and decreases in the 
acceptability of beating women and son preference. The infor- 
mation conveyed via cable television, often through somewhat 
surprising means, such as soap operas, exposed rural viewers to 
gender attitudes and ways of life, including within the household, 
more prevalent in urban areas. And it changed behavior.” 

Sorting mechanisms are powerful and often overlooked. 
Those charged with attracting the largest, most talented pool of 
applicants should make sure they scrutinize the messages, overt and 
biased, conveyed in their advertisements, websites, or other com- 
munications. The wording used, the incentive schemes employed, 


the work hours required, or even the number of others applying 


164 HOW TO DESIGN TALENT MANAGEMENT 


may unintentionally attract some but not others. And while talk 
definitely can be cheap, sometimes people do listen. They care 
about what people they look up to have to say, including the 
heroes and heroines of soap operas, who may even become role 
models—a topic we will turn to in Chapter 10. 


Designing Gender Equality—Attract the Right People 


e Purge gendered language from job advertisements and 
other company communications. 
e Pay for performance, not for face time. 


e Make the application process transparent. 


Part Three 


HOW TO DESIGN 
SCHOOL AND WORK 


8 


Adjusting Risk 


Guess what? Women do not like to guess. And it matters. Con- 
sider the widely taken Scholastic Aptitude Test (SAT), used for 
generations to help determine college admissions. Female test- 
takers are more likely than male test-takers to skip questions rather 
than to offer a guess. In one of the most consequential design in- 
novations promoting gender equality in recent history, the people 
behind the SAT took risk out of the multiple-choice questions. 
Introduced in the United States in 1926 to create a meritocratic 
screening device, the SAT is used to select students for college 
based on ability (or “innate intelligence,” as it was originally pur- 
ported) rather than their demographic characteristics, where they 
come from, or how much money their families have. It is admin- 
istered by the not-for-profit College Board and considered to be 
one of the most important tests American students ever have to 
take. Despite its honorable aspirations, as early as 1938 students 
started to take preparatory classes with Stanley Kaplan to improve 
their scores. This turned into a multi-billion-dollar industry with 


preparatory courses now being offered by many other companies 
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and private tutors. The expensive classes and tutors focus on knowl- 
edge, skills, and, often most important, test-taking strategies—a 
far cry from the test’s original intention of encouraging meritocracy 
instead of reinforcing existing privilege.' 

These and other concerns led the newly appointed president 
of the College Board, David Coleman, to initiate a fundamental 
overhaul of the SAT in 2012. Among the revised SAT’s new fea- 
tures is a design that equalizes the playing field between risk takers 
and risk avoiders. Starting in 2016, the SAT no longer penalizes 
test-takers for incorrect answers in multiple-choice questions. In 
the past, test takers received one point for each correct answer, but 
lost a quarter of a point for each incorrect answer. Test-takers could 
also skip questions, which had no impact on their raw scores. The 
redesigned SAT does away with the quarter-point penalty for in- 
correct answers.” 

What were Coleman and colleagues thinking? Surely, the Col- 
lege Board did not just want to encourage wild guessing. Or did 
it? Under the old regime, the smart strategy was for a test taker to 
guess whenever he or she could exclude at least one possible an- 
swer. Here’s why. Each multiple-choice question had five possible 
answers. If a test-taker could eliminate one, then randomly 
guessing meant a one-in-four chance of picking the right answer. 
If people adopted this strategy on, say, forty questions, on average 
they would get ten right and thirty wrong. For each correct an- 
swer, they would gain one point; for each wrong answer, they 
would lose one-quarter of a point. Thus, they would end up with 
ten points for ten lucky answers and lose seven and a half points 
for thirty unlucky answers for an expected gain of two and a half 
points. (If you were unable to eliminate an answer for each ques- 


tion and blindly guessed, the result would be a wash, on average 
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gaining eight points and losing eight points.) Thus, a person’s SAT 
score was determined by a combination of knowledge, some un- 
derstanding of probability theory, a willingness to take a risk—and 
pure luck. Guessing did not hurt—the worst possible outcome 
from pure guessing was an expected value of zero. And if a test- 
taker knew something and could exclude some alternatives, 
guessing definitely helped. 

With the penalty abolished, guessing becomes even more at- 
tractive. Using our example from above, with the quarter-point 
penalty gone, a pure guessing strategy now yields an expected 
value of 8. Guessing on the SAT has always been advisable, but 
with the new scoring system it has become all the more so. The 
College Board has taken the risk out of guessing. As a consequence, 
those more averse to risk, including women, should benefit. 

Indeed, a number of studies suggest that women are more likely 
than men to skip multiple-choice questions. An analysis of the Fall 
2001 mathematics SAT scores shows that women’s tendency to 
skip more questions can explain up to 40 percent of the gender 
gap in SAT scores. Similar effects have been found for multiple- 
choice tests in South Africa, microeconomics tests in the United 
States, and Hadassah aptitude tests in Israel. In the United States, 
20 to 40 percent of the well-established gender gap in political 
knowledge has been documented as resulting from men providing 
substantive but uninformed answers to surveys rather than marking 
“I don’t know.’? 

Katie Baldiga Coftman took these results to heart. She devel- 
oped experiments to measure not only any gender gap in willing- 
ness to guess on SAT-like tests, but also whether the gap was due 
to men being more willing to assume risk. And even more im- 


portantly, she tested whether decreasing the risk involved in 
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guessing, that is, doing away with the penalty for wrong answers, 
would help close the gap. 

It turns out that even among equally able test-takers, men were 
more willing to guess and women more likely to skip questions. 
Coffman’s experiment was the first study designed to measure 
what people would have known if they had been forced to answer. 
She learned that it was not gender differences in knowledge that 
led to differences in test scores. Rather, women’s reluctance to guess 
lowered their test scores significantly. About half of the gender 
gap in willingness to guess was due to differences in willingness 
to take risks—enough for the design innovation to work. Indeed, 
her findings are quite dramatic: when no penalty was assigned for 
a wrong answer, all test-takers answered all questions.* 

Literally hundreds of studies, with a very small number of ex- 
ceptions, support the notion that women are more averse to risk 
than men. In a large representative sample of more than 22,000 in 
Germany, Thomas Dohmen and colleagues found that women as- 
sessed themselves lower on a risk scale as well as made more risk- 
averse choices in a lottery for real money. Controlling for lots of 
other possible determinants of risk taking, the authors also report 
that shorter respondents, older people, and children of less-educated 
parents were all more risk averse. 

In addition to collecting data on people’s general willingness 
to take risks, the authors also focused on five specific domains: 
sports and leisure, health, career, driving, and finances. Women 
were less willing to assume risk in each. However, the gender gap 
was most pronounced in financial matters and in driving, and 
it was least pronounced in matters of career. Still, the differ- 
ences were large enough to help understand some career choices. 


For example, individuals who are less willing to take risks have 
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been shown to choose occupations with more stable and predict- 
able, albeit lower, average earnings.° 

Along with being less likely to guess, women are also less likely 
to speak up or offer opinions. They also need more assurances be- 
fore being willing to run for public office. While there are many 
barriers to women’s political participation and leadership, gender 
differences in willingness to take risks may contribute. Armed 
with the fact that serving in a state legislature is predictive of who 
runs for a US congressional seat (often more than half of sitting 
congressmen and congresswomen have served in state legislatures), 
Sarah Fulton, a political scientist at Texas A&M University, ex- 
amined how many men and women currently serving as state-level 
legislators were considering a move up to Congress. Put more pre- 
cisely: how high did the odds of winning have to be for male and 
female state legislators to consider running? For women, those 
odds had to be at least 20 percent. Men, on the other hand, were 
willing to jump into a race if the chance of winning was larger 
than zero. Female politicians do not gamble on long odds. They 
compete when the probability of success is decent. When they do 
run, however, their chances of winning are identical to men’s.° 

The Women and Public Policy Program offers an extracurric- 
ular training program, From Harvard Square to the Oval Office: A 
Political Campaign Practicum, for students who want to consider 
running for public office. The program provides students from 
around the world hands-on training in public speaking, fund- 
raising, data-driven voter targeting, and Get Out the Vote strat- 
egies, as well as navigating political parties and campaign media. 
It also provides students the chance to grow their professional and 
personal networks, in part by exposing them to political role 


models at all levels of government, such as former US secretary of 
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state Hillary Rodham Clinton, Minority Leader of the US House 
of Representatives Nancy Pelosi, former congresswoman Anne 
Northup, and former governor Christine Todd Whitman. 

In one particularly noteworthy session, Jennifer Lawless, a 
political scientist at the American University in Washington, DC, 
one of the leading experts on women in politics, shared US statis- 
tics on what she refers to as the “gender gap in political ambition.” 
Even though many things have changed over time, this gap has 
not. Women remain less ambitious to run for office than their male 
counterparts. The reasons are numerous, but include—based on 
people’s self assessments—trisk aversion, lower self-confidence, and 
less competitiveness. Women also perceive the electoral environ- 
ment as more hostile and biased than men do—and, based on Law- 
less’s work, they are right. Finally, and this is something you and 
I can start correcting today, they are less likely to be asked to run. 
This points to one of the easiest design interventions this book will 
offer: urge a woman to run for public office today! (Later in this 
book I will introduce you to startling evidence from India that 
underscores how profoundly women in public office can change, 
and indeed improve things.)’ 

Becoming a politician is a complex business, and disentangling 
exactly which part of a person’s decision is due to his or her risk 
preferences is a difficult science. This is why TV game shows can 
be useful. Consider that female contestants have been found to 
take fewer risks than men on shows such as Who Wants to Be a 
Millionaire or Deal or No Deal. Natalia Karelaia of INSEAD and 
collaborators, for example, analyzed women’s and men’s behavior 
in a Colombian game show, El Jugador (The Player). In this game, 
contestants had to answer general knowledge questions for up to 


five rounds. They started by competing against five other players. 
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After each round, each contestant has to decide whether to con- 
tinue to the next round or quit. If no one voluntarily quits, the 
player with the lowest score is forced to leave. Players who quit 
voluntarily can keep the money they have accumulated. If they 
stay and end up with the lowest score after the next round, how- 
ever, they lose all their earnings and are sent home. 

Similar to most game shows, substantially more men than 
women audition for El Jugador. Because the organizers made an 
extra effort to recruit women, they ended up with an almost equal 
gender split among contestants. But women were substantially 
more likely to quit voluntarily than equally skilled men. Why? 
Can this be explained by general differences in men’s and women’s 
willingness to take risks, or is something happening during the 
contest that makes men even more willing to assume risk?’ 

In his 2012 book The Hour Between Dog and Wolf, John Coates 
examines the impact of success and failure on risk taking. Having 
once been the head of a derivatives trading desk, he had the net- 
work and wherewithal to collect saliva samples from traders to 
measure their biochemical responses in “moments of transforma- 
tion,” namely after wins and losses. It turns out winners show 
heightened levels of a particular hormone, testosterone, found to 
lead to an increased appetite for risk, while testosterone levels in 
losers were reduced. Coates argues that such chemical reactions 
exaggerate booms and busts: when things go well, male traders 
seek ever more risk; when things go badly, they become overly 
risk averse. Women, however, seem to be largely immune to this 
winner’s effect. Alexandra van Geen of Erasmus University Rot- 
terdam shows in her experiments that men who have just won a 
lottery are more willing to take on risk than men who just lost, 


while winning did not matter for women. Women, of course, start 
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out with only a fraction of men’s testosterone levels. New work 
by John Coates, involving another hormone, cortisol, suggests that 
reducing stress on the trading floor might go a long way toward 
more accurate risk taking. Having more women on Wall Street 
would by biological necessity increase the diversity of testosterone 
and cortisol levels—an intriguing reason for firms to hire more 
female traders?’ 

Recent evidence from an experimental asset market and a 
meta-analysis based on thirty-five different markets from earlier 
studies by Catherine Eckel and Sascha Füllbrunn suggest that the 
answer is “yes.” Earlier studies using the same paradigm had found 
a consistent bubble pattern: in these asset markets, prices typically 
start below fundamental value, then increase way above it and 
crash before maturity. But they had never tried the paradigm with 
women only. A replication with exclusively male or exclusively fe- 
male traders revealed substantially larger speculative bubbles in 
all-male than in all-female markets. In some cases, all-female mar- 
kets even produced negative bubbles with prices below funda- 
mental value. A follow-up experiment showed that mixed-gender 
markets fall somewhere in between. The meta-analysis supports 
these findings, showing the smaller the price bubbles, the larger 
the share of women in a given market. The authors conclude: 
“These results imply that financial markets might indeed operate 
differently if women operated them . . . Our data suggest that in- 
creasing the proportion of women traders might have a dampening 
effect on the likelihood and magnitude of bubbles.” 

The pressures on the trading floor perhaps mask another, subtler 
effect of mixed groups. Who else is present when a particular task 
is undertaken may affect performance. Consider the finding that 


when put in groups of three and asked to complete a difficult 
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math test, women did worse in the presence of men. Consider as 
well that this wasn’t true when completing a difficult verbal test. 
Men, however, were not affected by the gender of the people 
present. Or consider a curious and worrisome pattern researchers 
detected when analyzing all 2005 SAT scores: taking the test in a 
smaller, less crowded venue increased your score. To the rationalist, 
this seems bizarre. Student test-takers are all aware that thousands 
and thousands of people take the test on the same day; the number 
of people who happened to be in the room with you during your 
test shouldn’t matter. Yet it did, introducing another unintended 
bias—test-taker density—in SAT results. 

Clearly, the design of the test matters, but so does the physical 
environment in which the test is taken. Seeing so many other stu- 
dents taking the SAT may have reminded test-takers of how com- 
petitive this business of getting into college is. The environment 
increased the perceived risk of not making it, and undermined 
people’s self-confidence and effort." 

An entirely different question is whether the SAT, or any test 
for that matter, helps us predict students’ future performance. The 
SAT is still widely used to determine college admissions in the 
United States on the belief that students’ test scores do a good 
job of predicting college achievement. The evidence, however, 
is mixed. And even those studies that suggest a link exists be- 
tween test results and college grades, completion rates, and even 
post-graduation income, also find that SAT scores under-predict 
women’s college performance compared to men’s.'” 

A series of studies examining how well various tests predicted 
student performance in the University of California system sup- 
port these concerns. They suggest that the SAT is a relatively poor 


predictor of college performance. High school grades proved to 
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be the best predictor of college grades and graduation rates, irre- 
spective of the school the test-taker had attended. What is more, 
and in keeping with findings regarding the value of workplace per- 
formance evaluations, the UC report raises concerns about using 
a student’s potential as criteria for admittance as it tends to un- 
fairly disadvantage poor and minority applicants." 

To make matters worse, we need to add another concern. 
Knowing how well someone performed—a student, a subordinate, 
or a teammate—might affect how you treat them. In this way, test 
performance becomes a self-fulfilling prophecy. The top students 
or employees do not necessarily do better because they are in fact 
better performers, but because you give them more attention, offer 
them better opportunities, and create work conditions in which 
they can thrive. 

A well-known study originally published in 1966, by the 
Harvard psychologist Robert Rosenthal, drives this point home. 
He joined forces with Lenore Jacobson, the principal of an ele- 
mentary school in San Francisco, and administered a cognitive 
ability test in eighteen different classrooms, ranging from kinder- 
garten through fifth grade. He then informed the kids’ teachers 
of the test results. About 20 percent of the students showed the 
potential for “unusual intellectual gains.” Indeed, the test proved 
to be right. A year later, the 20 percent of students identified as 
having unusual potential had gained an average of twelve IQ 
points, compared to an average of only eight points for the rest of 
the students. Two years later, the top students still outperformed 
their classmates. They got smarter more quickly than their peers. 

What the teachers didn’t know was that Rosenthal had chosen 
the initial “top” 20 percent at random. The difference “was in the 


mind of the teacher.” Teachers who believed students to be gifted 
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paid more attention to the children, set higher expectations, and 
supported the pupils in their learning and development. While sev- 
eral studies have confirmed the effect of self-fulfilling prophecies 
in the classroom, a review of the evidence suggests that the im- 
pact is likely more nuanced. The most worrisome news is that 
teacher expectations seem to be particularly relevant for groups 
who traditionally face discrimination. The better news, however, 
is that the effects of self-fulfilling prophecies are not as significant 
as feared and do not tend to accumulate over time.!* 

Still, we need to remain vigilant, as even very subtle cues can 
affect what people believe possible for themselves. Stereotype threat 
is one of the most widely studied topics in social psychology since 
its introduction by Claude Steele and colleagues in 1995. It argues 
that certain situational factors can lead people to confirm the neg- 
ative stereotypes about the social group they belong to. For ex- 
ample, Steele showed that when women were told that a math test 
was particularly difficult for women, they performed worse than 
men. When the tests were presented as being equally difficult for 
both genders, the gender gap in performance disappeared. Neu- 
roscience now offers clues as to why. When women are confronted 
with threatening environmental cues, neural activity increases in 
the ventral anterior cingulate cortex, the affective network in- 
volved in processing negative social information. As this activity 
increases, their math performance worsens. 

Stereotype threat starts early. The psychologist Nalini Ambady 
and colleagues tested its impact on Asian American children’s 
math performance in the Boston area. Before solving a few age- 
appropriate math problems, the youngest girls, five to seven years 
of age, were asked to color one of three pictures: a girl with a 


doll, priming their gender identity; Asian children eating from a 
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bowl with chopsticks, priming their Asian identity; or a land- 
scape, which was the control condition. When reminded of 
their gender, the girls performed significantly worse than the 
control condition. When reminded of their ethnicity, they did 
better than the control condition. Related studies found similar 
effects for German and Italian girls. In literally hundreds of studies 
and many different endeavors, ranging from golfing to standard- 
ized test-taking to entrepreneurship, stereotype threat has been 
found to reduce the performance of the negatively stereotyped 
group members.’ 

What can be done? Consider offering a friendlier environment. 
Remember that women tend to do better on math tests when the 
proportion of men around them is small. Fifteen-year-old girls in 
single-sex schools in the United Kingdom, for example, are as 
willing to take risks as their male counterparts. Girls from mixed- 
sex schools, however, are substantially more averse to taking risks. 
While you might (rightly) worry that different types of children 
attend single-sex as opposed to mixed-sex schools, an experiment 
conducted in Switzerland, where high school students were ran- 
domly assigned to single-sex and mixed-sex classes, rules out 
self-selection effects. The Swiss girls did substantially better in 
mathematics in single-sex classes, were better able to judge their 
abilities, and were more self-confident. 

Instituting same-sex schools would be difficult and expensive, 
and as we will see when we spend more time thinking about di- 
versity in Chapter 11, controversial. But there are incredibly 
simple fixes that can move the needle. Relocating the boxes where 
test-takers are asked to select their gender and ethnicity from the 
beginning to the end ofa test is one of them. While the impact of 


such changes is a matter of debate, I take a pragmatic view: 
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moving the box from the beginning to the end of the test harms 
no one and may help some.!° 

In mixed-sex classrooms, make sure everyone is able to con- 
tribute. In my classroom, I have two important rules: I count to 
five before I call on anyone to make sure I do not always choose 
the people whose hands are up first. And I ask students to build 
on each other’s comments to encourage listening—and a more 
productive classroom discussion. As I have not always been suc- 
cessful at promoting the latter, I have started to experiment with 
a little behavioral intervention. Early in the semester, I pick three 
students at random and ask the first one to summarize in one minute 
why he or she has decided to take the class. I give the person a little 
bit of time to think about what to say. Excellence is not the point 
of the exercise, the class later learns. Then, I ask a second person to 
do the same. Of course, that student has had more time to prepare 
and thus often does a little better. Finally, I invite the third person 
to stand up. But the rules have changed. I ask that student to sum- 
marize what the second person just shared with the class. Usually, 
that’s impossible. The student has been so focused on preparing a 
story that he or she forgot to listen. 

The exercise has several goals: First, I want to highlight that 
what most professors ask students to do, namely listen to what 
others have to say and build on their arguments to advance learning 
in class, is easier said than done. Awareness is a first step toward 
improving. Second, the exercise helps them experience firsthand 
the scientific evidence which indicates that most people are not 
good at multi-tasking. Listening to others and developing an 
argument at the same time is hard, let alone when these tasks are 
combined with the all-too-common student habit of surfing 


the internet or texting friends while in class. Finally, I use the 
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opportunity to explain my five-second rule and a few other strat- 
egies to provide a welcoming environment for those who need a 
bit more time to combine thinking with listening. For those 
who would like to test their ideas first before speaking up pub- 
licly, having people meet in small groups first to discuss complex 
or controversial topics strikes me as a very elegant design feature 
to level the playing field and enable everyone to contribute." 
Indeed, a desire to create small group contexts was one of the 
reasons for Harvard Business School (HBS) to launch a new course 
called “Field” in which students have to work in teams. Analyzing 
class participation in the typical first-year MBA course with eighty 
or ninety students in the room revealed that women and other tra- 
ditionally disadvantaged groups were less likely to contribute. 
Under the leadership of Dean Nitin Nohria, HBS has been 
working hard to create an environment where women—as well 
as men—can thrive. Small groups provide a learning space that 
differs from the large classroom, where students compete for the 
professor’s time and attention and where their participation deter- 
mines a large percentage of their grade. Working in teams, of 
course, also helps students develop important interpersonal skills. 
A big part of the behaviorally inspired design changes at HBS 
was ensuring fair, gender-neutral performance evaluations. Un- 
derstanding that we have to help faculty, students, and staff get it 
right by making it easier for them to do the right thing, HBS has 
scribes in classrooms to help faculty monitor who they call on. 
They have developed online tools for professors to instantly track 
class participation and grading by collecting data by gender and 
other relevant metrics. One such metric is whether English is a 
person’s native tongue. Even proficient English speakers take longer 


to insert themselves in a discussion if English is not their first lan- 
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guage. Any grade based on class participation should be designed 
to take this into account. 

HBS has made tremendous progress, increasing both the frac- 
tion of women students as well as narrowing the gender grade gap, 
but this is a work in progress. As Robin Ely said in a 2013 Atlantic 
Monthly article, “There are no silver bullets for dealing with 
second-generation bias—not scribes in the classroom, not software 
that ensures all students have a chance to participate, whether male 
or female, on the left side of the room or the right—because no 
two or three things hold women back.” Frances Frei summed it 
up nicely in a 2013 New York Times article: “We made progress on 
the first-level things, but what it’s permitting us to do is see, holy 
cow, how deep-seated the rest of this is.” Ely and Frei, both fac- 
ulty members at HBS who have been instrumental in changes 
made there, are too modest. Precisely because we can now see how 
ingrained unconscious gender bias is, we can start to design and 
experiment our way out of the thicket of bias. They and their col- 
leagues have started to redesign HBS much like David Coleman 
and colleagues have started to redesign the SAT—removing bias 
from risky environments to level the playing field and let talent 
speak for itself." 


Designing Gender Equality—Reduce Bias 
in Risky Environments 


e Adjust risk when gender differences in willingness to 
gamble may bias outcomes. 

e Remove clues triggering performance-inhibiting 
stereotypes. 


e Create environments inclusive of different risk types. 


9 


Leveling the Playing Field 


The word bloke is not only used to make Diet Coke appeal to men. 
I was introduced to it in a different context when our two boys 
attended school in Sydney, Australia, where I worked for a se- 
mester. Various Australian schools had launched a new program: 
Boys, Blokes, Books and Bytes (B4). Its goal is to motivate boys 
to read. In short, it is a design that addresses one facet of gender 
inequality. B4 offers events and activities reflecting male learning 
styles and involving male role models, a.k.a. blokes, in an effort to 
close a gender gap in literacy. 

Across the globe, boys are increasingly falling behind in reading 
and writing, as measured in tests such as the National Assessment 
Program Literacy and Numeracy (NAPLAN) in Australia or the 
National Assessment of Educational Progress (NAEP) in the 
United States. Many researchers and policy makers wonder 
whether teaching styles, such as requiring students to sit still for 
long stretches of time, might inhibit male learning. The dearth of 
male teachers and male role models who read likely does not help 
either. After he and his son had participated in one B4 program, 
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one father, Gary, reported: “I will recommend this course to other 
dads because for me it broke down the fear of doing the wrong 
thing when reading with my son.” 

The right thing to do likely entailed nothing more than a father 
reading to his son. I will spend more time discussing the role 
model effect in Chapter 10, but it ought to be stated here that there 
is increasing evidence that having a teacher of the same sex improves 
how not only boys but girls perform and how their teachers per- 
ceive them! 

A 2015 report by the OECD titled The ABC of Gender Equality 
in Education: Aptitude, Behaviour, Confidence found that in all sixty- 
four countries studied, girls outperformed boys in reading and 
writing. By age fifteen, the gender gap in literacy corresponds to 
an extra year of schooling. By comparison, the gender gap in math- 
ematics is shrinking. By age fifteen, boys are ahead of girls by just 
three months. In science, girls and boys perform about equally. 

The problem is well documented, and with varying degrees of 
concern many countries have come up with interventions to ad- 
dress it. What is interesting is that no country has been able to 
close the gender gap in mathematics and in literacy at the same 
time. In fact, it appears as if there was a trade-off between gender 
equality in literacy and gender equality in math. In Latin Amer- 
ican countries, such as Chile, Colombia, Mexico, and Peru, the 
gender gap in reading is small, but the gender gap in mathematics 
is large. In contrast, in Nordic countries such as Iceland, Norway, 
and Sweden, girls do as well as boys in math while boys have not 
caught up with the girls in reading.” 

How can we design environments in which both boys and girls 
thrive? What if we cannot have it all? If boys and girls differ in 


how they learn best, designing environments that truly level the 
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playing field between men and women will prove difficult. 
Instead, it will favor one of the two sexes. Maybe this is what 
the apparent trade-off between gender equality in math and lit- 
eracy across countries is telling us. In some cases, trade-offs are 
explicit and certain games are zero-sum. Given that budgets are 
limited, and even more so in poorer countries, every scholarship 
to improve access only for girls or only for boys is a scholarship that 
does not go to a member of the other sex. What is more, this is 
funding that is not being used for interventions that might benefit 
boys and girls equally. 

While it is true that coming up with gender-neutral procedures 
is hard, it is not impossible. Recall again the blind auditions used 
by many orchestras. Creativity, data, and experimentation will 
help. And while some situations involve trade-offs, others clearly 
do not. Firms and schools want their workers and students to play 
their very best game. Thoughtful design architecture will help. 
That male and female students, workers, managers, politicians, and 
leaders do not necessarily thrive under the same conditions will 
require care and attention when choosing behavioral designs that 
may impact groups differentially. 

To better understand potential trade-offs, researchers have 
started to tease apart three effects: the impacts on the targeted 
group, the untargeted group, and overall. For example, a random- 
ized controlled trial evaluating the impact of merit-based scholar- 
ships for girls in Kenya found that they significantly improved the 
recipients’ test scores. Interestingly, what could have been an in- 
tervention favoring one sex over the other turned out to have 
broader beneficial effects. The scholarships also had a positive im- 
pact on their female and male classmates who had not received any 


support. What is more, the treatment schools eligible for the schol- 
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arship program also saw a 5 percent increase in teacher atten- 
dance rates as compared to the control schools that did not have 
access to the program, benefiting boys and girls equally. Thus, a 
program aimed at a small subset of the population can benefit the 
entire population through spillover effects, in this case, through 
peer effects and motivational impacts on teachers. 

This is good news—but of course, that is not always the case. 
More importantly, we cannot stop there. Another question a re- 
sponsible designer has to ask is whether merit-based scholarships 
to girls were the most cost-effective way to achieve the positive 
impacts mentioned above. In this case, they were not. To better 
understand this much understudied question, a group of researchers 
at the Abdul Latif Jameel Poverty Action Lab at MIT recently set 
out to develop a framework for “comparative cost-effectiveness 
analysis” and used educational interventions as their first applica- 
tion. As you might expect, scholarships are an expensive way to 
improve school attendance and performance in developing coun- 
tries. But it is not textbooks, subsidized meals, or free school 
uniforms that promise the most impact at the lowest cost. Instead, 
one of the most cost-effective ways to improve school attendance 
came as a big surprise. It was identified shortly before the turn of 
the century and since has become one of the most impressive 
success stories of evidence-based decision-making: deworming, 
an intervention aimed at improving children’s health. In many 
cases, the binding constraint keeping both boys and girls out of 
school is their poor health. 

Based on the evidence collected by two development econo- 
mists, Michael Kremer and Ted Miguel, in randomized controlled 
trials in Kenya, school-based deworming programs have now been 


scaled up by Evidence Action, a nonprofit organization that 
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implements cost-effective and evidence-based interventions, 
to almost 100 million children in Kenya and India through its 
Deworm the World Initiative. In their evaluation in Kenya, school- 
based deworming decreased serious worm infections by half and 
school absenteeism of boys and girls by one-quarter—at a cost of 
less than fifty cents per child per year.* 

So the answer to my question is: yes, we can design environ- 
ments where both girls and boys thrive. Learning from deworming 
and the redesigned SAT, we can equalize the playing field. De- 
worming is a cost-effective intervention that does not have 
sex-specific impacts, and taking out risk from the SAT turned a 
gender-biased instrument to measure ability and performance into 
a gender-neutral one. Sometimes particular cases resist gender- 
neutral redesign and will continue to impact men and women 
differently. Even in such cases not all is lost. 

Consider overconfidence, one of the most potent and perva- 
sive biases. Overconfidence “has been blamed for wars, stock 
market bubbles, strikes, unnecessary lawsuits, high rates of entre- 
preneurial bankruptcy, and the failure of corporate mergers and 
acquisitions,” as Max Bazerman and Don Moore report in their 
review. While it appears in various shapes and forms, one simple 
illustration is the “better-than-average” effect. When I ask my stu- 
dents to indicate how well they drive or what they expect their 
final grade to be, 70 to 80 percent believe that they are better than 
average. Thirty percent expected their grades to be in the top 
10 percent. Much research suggests the phenomenon “not only to 
be marked but nearly universal.” 

But not quite. It turns out that men tend to be substantially 
more overconfident than women. They are not only willing to 


take more risks when the risk factors are known, as discussed in 
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Chapter 8, but they are also more optimistic when assessing risky 
situations. Men seem to be particularly overconfident in areas 
where they are expected to have expertise. For example, Brad 
Barber and Terry Odean looked at data from a large discount bro- 
kerage firm and found that male investors were so overconfident 
in their own ability that they traded 45 percent more than female 
investors and as a consequence made significantly less money than 
the women did. 

The evidence suggests strongly that it is almost impossible for 
us to assess ourselves objectively, and de-biasing employees who 
think more highly of themselves than others is very challenging. 
A meta-analysis examining perceptions of leadership effective- 
ness across nearly 100 independent samples found that men per- 
ceived themselves as being significantly more effective than women 
did when, in fact, they were rated by others as significantly less 
effective. 

Women tend to be overconfident to a lesser degree and in some 
instances even underconfident. For example, women tend to un- 
derestimate their ability in mathematics and overestimate how 
good they have to be to succeed in higher-level math courses. 
Generally, female students are more likely than male students to 
take grades as an indicator of their mastery of a given subject. They 
are more likely than their male counterparts to drop courses in 
which their grades are lower; in the United States this is some- 
times referred to as the “fear of B- effect.” The loss of self-esteem 
in introductory math and science courses has been identified as an 
important factor explaining why women drop out of science and 
engineering majors.” 

We have discussed performance appraisals previously, but it is 
worth revisiting them here: gender differences in self-confidence 
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are also a concern for performance appraisals. Many firms ask their 
employees to evaluate themselves and then to share these self- 
evaluations with their supervisors. Consider the following sce- 
nario: supervisors must rate subordinates on a scale from one to 
ten, with ten indicating perfect performance. On their own, the 
managers would give each of their two direct reports, a man and a 
woman, a score of seven. But the male employee is overoptimistic 
about his abilities and the female employee underconfident; the 
former evaluates himself as a nine and the latter gives herself a 
five. Keeping the average score across the two constant and 
thereby meeting firm requirements regarding curves, the super- 
visor is tempted to adjust his or her initial score, making the male 
employee less unhappy by only downgrading him to an eight and 
making the female employee happy by upgrading her to a six. 
This is another example of the anchoring effect: inadvertently, sub- 
ordinates have thrown a reference point at their managers who 
then cannot help but take it into account when making their own 
assessments. 

Fixing anchoring effects in performance appraisals is easy: 
simply do not share employees’ self-evaluations with their managers 
before they make up their own minds. Or even better: do away 
with self-evaluations altogether. To date, I have not come across 
any evidence suggesting that having people rate themselves yields 
any benefits for themselves or the organization. Consider junking 
self-evaluations as a mitigation strategy. While you cannot keep 
women from evaluating themselves more harshly than their male 
colleagues evaluate themselves, you can contain the impact the 
gender difference in this bias has on people’s lives.° 

Sometimes, you can intervene even a little bit earlier. You can 


give people feedback on how well they are doing, maybe as com- 
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pared to others, to help them update their biased beliefs and reas- 
sess their standing. This design has proven to be particularly ef- 
fective in mitigating against the gender gap in willingness to 
compete. The original studies examining men’s and women’s will- 
ingness to compete with others were conducted in Israel by Uri 
Gneezy, Muriel Niederle, and Aldo Rustichini. The researchers 
had the participants of their studies join in a task for which they 
either were paid a fixed amount for every problem solved correctly 
(a piece rate), or participated in a tournament style, where earnings 
depended on relative performance. Men and women performed 
about equally in the piece-rate scheme, but in the tournament 
setting men worked harder and did better. What is more, when 
given the choice between being paid based on a piece rate or a 
tournament scheme, women “shy away” from competition while 
men “embrace” it, subsequent research with American study 
participants by Niederle and Lise Vesterlund showed.’ 

This leads to inefficiency: we have too many of the low- 
performing but overconfident men and too few of the high- 
performing but less confident women choosing to compete. 
Gender differences in risk aversion and overconfidence can explain 
some of this, but it also appears as if women just disliked com- 
peting more than men. And, again, it matters. In the Netherlands, 
willingness to compete has been shown to predict which high 
school tracks students choose. The more competitive Dutch stu- 
dents, who tend to be the male students, are more likely to choose 
mathematics and science, which are the more prestigious, higher 
return tracks. For MBA graduates at Booth School of Business at 
the University of Chicago, interpersonal differences in willingness 
to compete translate into significant differences in earnings, explain 


part of the gender gap in earnings, and have long-lasting effects: 
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competitive graduates are more likely to work in higher-paying 
industries nine years after graduation.’ 

The impact of gender on competitiveness has been studied ex- 
tensively in the laboratory and the field. Results appear to be sen- 
sitive to the task involved, the composition of the team, and the 
context in which it is undertaken. One study, particularly inter- 
ested in these contextual effects, pushed the envelope to find two 
societies with substantive differences in their gender relations. The 
researchers chose the Maasai in Tanzania, an extremely patriar- 
chal society where “women are said to be less important than 
cattle,” and the Khasi in Northeast India, a matrilineal society in 
which women are the heads of households. They invited the vil- 
lagers to participate in a task where they had to try to throw a ball 
in a bucket ten times. The participants were asked whether they 
wanted to be paid a piece rate, receiving a fixed sum for every suc- 
cessful throw, or to compete with another, anonymous villager 
participating in the same task at a different location. If they chose 
the tournament, the winner would receive three times the fixed 
amount and the loser would receive nothing. 

The Maasai behaved pretty much like Americans: half the men 
and only a quarter of the women decided to compete. In contrast, 
more than half of the Khasi women decided to compete while men 
were about 15 percentage points less likely to enter a competition. 
While the Maasai and Khasi differ in many ways in addition to 
their gender relations, the experiments provide support for a 
nurture-based account of competitiveness. 

Other work suggests that the gender difference in competition 
is particularly pronounced for male-typed tasks, such as a math 
task, while it disappears in female-typed tasks, such as a verbal task. 


As in test-taking, who else is in the room matters. When it was 
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only women in the room, women were more likely to compete 
and did better in competitions. Another interesting example in- 
volved seven- to ten-year-old Swedish children, male and female, 
among whom Anna Dreber of the Stockholm School of Economics 
and her colleagues found no gender differences in competitive- 
ness at all. In their sample, it did not matter whether the kids tried 
to out-run or out-dance each other, activities associated with male- 
ness and femaleness respectively.’ 

Sweden appears to have created an equal playing field where 
there are no gender differences in competition. More generally, 
Sweden and other Nordic countries consistently appear at the top 
of rankings measuring gender equality. But turning countries 
like the United States into Sweden is a tall order. While a topic 
of extensive debate, gender equality in the Nordic countries is 
likely the result of a complex interplay of a large number of fac- 
tors. Fortunately, without attempting to turn countries into 
Sweden, there is lower-hanging fruit anyone can harvest.'° 

For example, we can help people make more accurate self- 
assessments. The larger goal, both societal and within organizations, 
is to encourage the right people to participate in competitions—not 
the overconfident ones but the most able ones. Good feedback on 
how well someone is doing compared to others can do this for your 
organization. In various experiments employing the designs intro- 
duced above where people participate in piece-rate or in tourna- 
ment schemes, information about relative performance has been 
shown to move high-ability women to more competitive environ- 
ments and low-ability men to less competitive forms of compensa- 
tion, such as piece rates. It eliminated gender differences in choices." 

Feedback about relative performance can help those less 
confident about their abilities, more risk averse, and less inclined 
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to compete know if their hesitation is warranted or not. In addi- 
tion, it also provides a reality check for those too eager to com- 
pete but unlikely to win. A while back, Robert Frank and Philip 
Cook wrote a fascinating book, The Winner-Take-All Society, in 
which he suggested that too many people waste their lives running 
after the big prize. Achieving the successes of Serena Williams or 
Roger Federer in tennis indeed is rare and becoming a professional 
tennis player truly is risky business. Overconfidence in pursuit of 
the unattainable is costly for individuals and organizations; cali- 
brating ambition to ability is a lifelong lesson. Designs can help.’ 

Not sharing biased self-evaluations with others to avoid con- 
taminating their judgments, and offering feedback to the under- 
and overconfident to help them make more accurate assessments 
are attractive designs that help equalize the playing field among 
people with different degrees of self-confidence and competi- 
tiveness. But we enter rocky terrain here. There has been much 
rhetoric, much of it overblown and not well grounded in evi- 
dence, about the differences between men and women. Are 
women really “from Venus” and men “from Mars,” as a 1992 best 
seller by John Gray, a relationship counselor, suggests, and do these 
differences indeed predict “The End of Men,” as a much-discussed 
2010 article by Hanna Rosin in the Atlantic Monthly and a later 
book argue?’ 

Whatever the mechanisms driving differences in values and at- 
titudes between men and women, the variance across countries 
suggests that they cannot be attributed to nature—at least, not ex- 
clusively. To be sure, various hormones have been identified as 
being related to certain behaviors. In Chapter 8 we discussed the 
relationship between testosterone and risk taking. Additional evi- 


dence from Swedish university students suggests that prenatal ex- 
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posure to testosterone influences a child’s willingness to take risks. 
In addition, the menstrual cycle has been shown to affect women’s 
willingness to assume risk. But I do not expect science to ever re- 
solve this question and determine what fraction of the observed 
gender difference is due to nature and what is due to nurture."* 

Consider another potential gender difference about which there 
is much speculation: do women care more about others than men 
do? Richard Zeckhauser and his collaborator, John Rizzo, set out to 
examine whether or not this was relevant for physicians. They an- 
alyzed data from a nationally representative sample of young 
physicians in the United States from the 1987 and 1991 Practice 
Patterns of Young Physicians Surveys. The data showed that fe- 
male physicians tended to work fewer hours per week and spent 
more time with their patients than their male counterparts. They 
also opted for less costly and invasive treatments. For example, 
female obstetricians/gynecologists performed cesarean sections 
and hysterectomies significantly less often than their male col- 
leagues. But how does this work? Do male and female physicians 
“want” to do this, or are there other forces at play? 

It turns out that young male physicians set higher income goals 
for themselves (what they refer to as “adequate income” in the 
survey) than women did and then needed to live up to them. We 
do not know what causes this. It may be due to men caring more 
about money, as the question might have suggested, but it could 
also be a reflection of the expectation that men be the main 
breadwinner in a family. Or it could simply reflect male over- 
confidence as discussed earlier. No matter the underlying cause, 
physicians who want to close an earnings shortfall between their 
aspirations and reality have two options: they can work more 


hours, or they can try to make more money per hour worked (or, 
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of course, some combination of the two). They can boost their 
hourly incomes by charging more, seeing more patients per hour, 
and prescribing more procedures, particularly choosing those that 
reimburse the prescribing doctor at a higher level. Or they can 
move their offices to an area that commands higher prices for med- 
ical services. Women, it was found, pursue none of these strate- 
gies. Of course, having set lower income goals for themselves, they 
have a smaller shortfall to close. In contrast, male physicians typi- 
cally choose the second option: they raise their hourly incomes by 
pursuing an array of strategies aimed at boosting their earnings 
to better align with their goals. In this study, the gender gap in 
income goals and in responsiveness to those goals fully explains the 
substantial gender gap in earnings and in earnings growth over 
time. 

Based at least on this study, female physicians do not intrinsi- 
cally care more for their patients but rather can afford to do so 
given the goals they set for themselves. While patients likely care 
little about where the difference in their treatment quality comes 
from, for a behavioral designer the root causes are key. If the reason 
for the difference in care is doctors’ aspirations, as the study sug- 
gests, we clearly have to attack the doctors’ goal-setting process. 
If the root cause lies in how much doctors are concerned with 
others’ well-being, changing values should be our objective— 
which is likely the harder problem to address through design. 
Thankfully, the evidence does not suggest that we have to try to 
attack values. In fact, the evidence on potential gender differences 
in regard for others is less clear than you might think. 

Much of the laboratory evidence on “other-regarding” be- 
havior stems from a simple experiment called the dictator game. 


Two people are involved. One of them is randomly chosen to 
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receive a certain amount of money. The other person does not 
receive anything, and, indeed, is a passive participant. The first 
person is then asked how much, if anything, he or she wants to 
share with the second person. Women tend to share a bit more 
with their counterpart and keep a bit less for themselves than men 
do. But the differences are generally small and depend on context. 
For example, men have been found to be more responsive to the 
price of altruism than women. When the rules of the game stipu- 
late that giving up $50 will translate into someone receiving $100, 
they are more willing to do so. In contrast, women willing to give 
up $50 hardly care about whether their donation translates into a 
benefit of $50 or $100 for others. (This means that charities in 
countries offering tax deductions to donors are marginally better 
off targeting men than women.)”* 

Even if the gender differences in how others are regarded are 
tiny and depend on context, they can turn into self-fulfilling 
prophecies, and belonging to a social category expected to be more 
other-regarding can turn into a liability. Consider the following 
clever experiment by Lise Vesterlund, Linda Babcock, Maria Re- 
calde, and Laurie Weingart. Motivated by their own experiences 
as female faculty members, they wanted to explore whether women 
had a harder time saying “no” when asked for a favor. They were 
worried that women might spend more time on what they refer 
to as “non-promotable tasks” such as serving on administrative 
committees, organizing an event, or mentoring and evaluating 
others, and thus have a harder time climbing up the career ladder. 
Their experiment was designed to capture the incentives people 
confront when asked to undertake a task that they would benefit 
from, but which they would prefer to leave to somebody else. Most 


university faculty members prefer someone else to organize their 
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department’s speaker series and most partners in law firms would 
rather have someone else mentor their firm’s summer associates. But 
in both cases, faculty and partners benefit from great talks and won- 
derful future colleagues. 

In the experiment, people were assigned to groups and their 
earnings depended on finding one group member willing to vol- 
unteer. They were all better off if one person volunteered, but vol- 
unteering was costly for the person doing it. When in same-sex 
groups, women and men were equally willing to volunteer. But 
when grouped with members of the opposite sex, the pattern sud- 
denly changed. Women volunteered more and men less. Everyone, 
including the women, assumed women would volunteer more than 
men. Accordingly, men adjusted their behavior, expecting to ben- 
efit from the women, and women lived up to their expectations. 

This is a common pattern at universities. Researchers at one 
large university confirmed that there, too, female faculty were 
more likely to say “yes” to a request to serve on an administra- 
tive committee than their male counterparts. Committee work 
is an important service to the university, but it rarely benefits the 
individuals participating. It is troubling that this pattern appears 
to be generalizable. In an excellent review of why there are so 
few women working in STEM fields, Stephen Ceci of Cornell 
and colleagues discuss various studies finding that male faculty 
spend less time on service activities and teaching but more on 
research, the most promotable task in academia. To what degree 
these differences inhibit women’s careers is an open question. To 
answer it, we would need a detailed analysis of gender differences 
in time allocation by field. For example, do female academics in 
fields where women have traditionally been under-represented, 


such as engineering, economics, or mathematics, have less time 
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to do research than women in psychology, and the social and life 
sciences? We do not know. For now, Ceci and colleagues conclude 
that mostly, “current barriers to women’s full participation in 
mathematically intense academic science fields are rooted in pre- 
college factors.”'® 

If you are concerned about gender-based imbalances in your 
organization, you can easily follow in Harvard Kennedy School’s 
footsteps. We try to measure (and compensate) as many contri- 
butions relevant to the institution as possible. We employ a point 
system to measure faculty’s workload. A full-time faculty mem- 
ber’s workload is 100 points, with a margin of error of 10 percent 
plus or minus. If faculty members contribute substantially more, 
the school compensates them more. If a faculty member falls sig- 
nificantly short of workload expectations, explanations and some- 
times adjustments in compensation or time status are in order. Points 
are allocated for teaching and administrative tasks, and faculty 
have substantial flexibility in how they meet their obligations. Some 
might choose to teach more than the minimal requirement, others 
might give more of their time and effort to service or organiza- 
tional leadership opportunities. 

The point system has lots of advantages, providing incentives 
for people to deliver the public goods everyone benefits from. The 
flexibility allows them to trade off activities they are less good 
at for tasks they are better at, making everyone better off. As a de- 
sign, this is almost a pure win-win. True, not everything is quan- 
tifiable, which is why every few years we have a discussion about 
whether our workload system rewards effort adequately and does 
not crowd out intrinsic motivation. But I do not mind, and I be- 
lieve most of my colleagues prefer our system to the less flexible, 


more opaque, and seemingly less equitable systems followed at 
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other academic institutions. At the very least, the Kennedy School’s 
points bring inequities out into the open and make service activi- 
ties a part of any conversation about faculty contributions. 

The larger issue is, of course, this. Through the measurement 
of all contributions, we can correct for differential impacts of 
gender bias ex-post. Ideally, we would level the playing field ex- 
ante or intervene to mitigate impacts before they fully hit. But if 
we can’t, ex-post compensation for career-relevant imbalances 
resulting from the expectations placed on one’s gender can be a 
helpful design feature. 


Designing Gender Equality—Level the Playing Field 


e Prevent gender bias from having an impact: use gender- 
neutral designs. 

e Mitigate the impact of gender bias on yourself and 
others; do not share biased self-assessments with supervi- 
sors; give feedback to help people correct their biases. 

e Compensate for differential impact due to gender bias. 


Part Four 


HOW TO DESIGN DIVERSITY 
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Creating Role Models 


Like most office lobbies, when you enter Harvard Kennedy School 
you will see portraits on the wall. They hang in our public spaces, 
classrooms, conference centers, and office suites. Inspired by 
research findings, we recently added a few new ones. Made pos- 
sible through the leadership of my colleague, Jenny Mansbridge, 
and the Women and Public Policy Program, they now include 
Ida B. Wells, the US civil rights activist and suftragist; Abigail 
Adams, the second First Lady of the United States; Edith Stokey, 
economist and “founding mother” of the Kennedy School, and 
Ellen Johnson Sirleaf, President of Liberia, winner of the Nobel 
Peace Prize, and a graduate of the school. 

This is a work in progress. Harvard still has more to do. The 
student newspaper, the Crimson, reported in March 2012: “Of the 
more than 60 figures portrayed in the art of Annenberg Hall, 
three are women—two of them are tending to children, while 
the third welcomes her warrior husband home to a life of do- 
mestic tranquility.” 


New portraits of women leaders at Harvard Kennedy School: Ellen Johnson 
Sirleaf, President of Liberia 
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Does it matter? Yes. Students’ attitudes can be affected by subtle 
and simple changes. Sapna Cheryan of the University of Wash- 
ington has demonstrated that just by changing decorations in a 
computer science classroom, losing the Star Wars and Star Trek 
images for gender-neutral art and nature pictures, female students’ 
associations between women and careers in computer science were 
strengthened. Even one’s choice of screen saver can have a conse- 
quence. One study subtly exposed people either to a picture of 
Hillary Clinton, Angela Merkel, Bill Clinton, or no picture be- 
fore they had to give a public speech. Women who had seen a pic- 
ture of a female leader gave longer speeches that were rated higher 
both by external observers as well as by the women themselves 
than those who had seen a picture of Bill Clinton or no picture. 
Role models did not affect men. They did equally as well, whether 
exposed to Bill or to Hillary Clinton or to no picture. Another 
study showed that pictures are not even necessary—just asking 
people to imagine what a “strong woman” looks like can under- 
mine stereotypes.! 

Most organizations confronted with such simple design choices 
act unthinkingly. When entering a board room, you typically meet 
the previous—typically all male—company leaders. Correcting 
this sort of gender inequality through design is the very definition 
of low-hanging fruit, or at about the height one hangs a picture. 
One multinational I advise often gathered in one room when 
reaching important promotion decisions. Its walls were decorated 
with the portraits of previous CEOs. All were men, of course, and, 
as I pointed out, this was hardly conducive to triggering counterste- 
reotypical associations between gender and leadership. Indeed, 
only ten years ago, all portraits at the Kennedy School had been 


of men. 
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The ability of role models, in portraits and more importantly 
in the flesh, to influence gender inequality is both encouraging 
and muddy. We have already encountered in Chapter 9 research 
pointing out the positive consequences of having same-sex instruc- 
tors and students, whether to encourage boys to read or girls to 
excel at math. But the promise of role models promoting the moral 
imperative of gender equality may be far greater. In 1993, the In- 
dian government amended its constitution with the following 
provision: Going forward, village councils needed to reserve one- 
third of their seats for women. Additionally, one-third of council 
leaders, the pradhans, had to be women. Described as “landmark 
legislation” that would “forever change the face of rural Indian 
politics” and as “one of the best innovations in grassroots democ- 
racy in the world,” more than 1.5 million women were to be 
elected representing 800 million people in the world’s oldest and 
largest democracy.’ 

In a speech late in 2011, then US secretary of state Hillary 
Rodham Clinton built on India’s amazing innovation to launch 
the Women in Public Service Project, a collaboration between the 
US State Department and five all-female colleges—Barnard, Bryn 
Mawr, Mount Holyoke, Smith, and Wellesley—aimed at empow- 
ering women from around the world to serve in the public sector. 
The tangible accomplishments of India’s experiment inspired the 
project’s aims. When introducing the project, Secretary Clinton 
described just some of the Indian legislation’s impact: “Over a very 
short period of time, studies showed that women in these posi- 
tions [India’s village council leaders] started investing more in 
public services, from clean water to police responsiveness, than 
their male counterparts had. And there were other benefits. With 


more women installed as council leaders, more women spoke up 
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in council meetings than ever before. And in a nation where the 
under-reporting of crimes against women is widespread, more 
women came forward to file complaints about abuse, because they 
were more confident that the police would take action.” 

We know this and more because India’s path-breaking legisla- 
tion was implemented in a way that allowed a number of social 
scientists, including Esther Duflo of MIT and Rohini Pande of the 
Kennedy School, to rigorously evaluate its impact. Not only did 
the law mandate that one-third of village leaders in a district have 
to be women, it also required that these villages be picked ran- 
domly out of a hat in each election cycle. This turned the inter- 
vention into a “natural experiment” where some random sample 
of villages received the “treatment,” in other words, a female leader, 
while others did not. Thus, the researchers could analyze villages 
that were comparable other than for the sex of the village chief 
and measure what difference the new enforced gender difference 
really made. 

I often describe this research as constituting the “gold standard” 
of how to measure the impact of quotas. The evidence gained from 
India’s intervention is based on a true experiment, allowing us to 
make causal inferences about the impact of quotas. In no other 
country in the world have quotas been introduced randomly. Usu- 
ally, we have to rely on “before-and-after” comparisons or cor- 
relational analyses. While helpful, this data is much less reliable 
than experimental evidence. 

Through the Panchayati Raj Act, as the amendment is com- 
monly called, India was able to increase the share of women in 
local government from 5 percent in 1993 to 40 percent by 2005, 
well exceeding the mandated quota of 33 percent. And despite 


early rhetoric suggesting that the female council members were 
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either ill qualified or the relatives of powerful men just serving as 
their proxies, the women village leaders did all the good things 
Secretary Clinton mentioned in her speech: they provided more 
key public goods such as drinking water, roads, and education, in- 
creased the reporting of crimes (including of rapes), and accepted 
fewer bribes than their male counterparts. 

The women leaders also substantively changed the face of 
politics. They served as role models in various ways and with mea- 
surable consequences. With the advent of female village leaders, 
the likelihood that a woman spoke up in a village meeting in- 
creased by 25 percent. If a seat had been reserved for a woman 
in the previous election cycle, women were more likely to run 
in a subsequent open election, competing with men, a study for 
the state of Maharashtra showed. Seeing women leaders changed 
perceptions—making women more confident that they could run 
for public office and making men more accepting of women as 
leaders. 

Villagers who had been exposed to at least two female chiefs 
in West Bengal overcame their initial bias against women as leaders 
and rated male and female leaders equally. This included male 
villagers who, when the quota was first introduced, declared that 
they did not like voting for women. But after being exposed to a 
minimum of two female leaders, they had become comfortable 
with voting for them. 

Administration of the Implicit Association Test confirmed that 
female role models influence gender bias, up to a point. Male vil- 
lagers who had never had a female as chief consistently rated them 
lower than male chiefs. But male villagers who had had female 
chiefs rated male chiefs as less effective. This finding is particu- 


larly remarkable given the evidence from a survey that sought to 
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discover if ratings in effectiveness translated into differences in who 
people liked. Sadly, the answer was no. Male villagers exposed to 
female chiefs might rate them higher, sometimes even higher than 
male chiefs, but they still found male leaders more likable— 
suggesting that the competence-likability dilemma that Amer- 
ican women in male-dominated fields face also applies to female 
leaders in India. 

Evidence for the positive benefits of India’s grand experiment, 
however, do not stop with perceptions of effectiveness. Perhaps 
most strikingly, female chief role models affected parents’ career 
aspirations for their children. After having experienced a female 
chief twice, parents were more likely to want their daughters to 
study past secondary school, basically eliminating the gender gap 
in aspirations. The influence didn’t end with the parents but ex- 
tended to their daughters. Girls exposed to female village chiefs 
spent less time on household activities and wanted to marry later. 
The quota system had created role models for the girls and their 
parents, enabling both to imagine and see the value of a different 
future. A comment made by Abhijit Banerjee of MIT—the au- 
thor, together with Esther Duflo, of the important book Poor 
Economics—gets at this shift in attitudes. Interviewed for the film 
Gender Equality: The Smart Thing to Do, which the Women and 
Public Policy Program produced in 2011, he reported hearing 
something unexpected and attributable directly to the Panchayati 
Raj Act. When he talked with parents in rural India, they spoke of 
a new aspiration for their daughters, “to be in politics because there’s 
a lot of reservations for jobs for women in politics. That’s an answer 
that I would not have expected, or didn’t hear, ten years ago.’””? 
Overall, the introduction of political gender quotas in India 


was a success. The act of seeing women lead increased women’s 
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self-confidence and their willingness to compete in male- 
dominated domains, and it changed men’s and women’s beliefs 
about what an effective leader looked like. This mattered for elec- 
tions, making everyone more willing to vote for women. It also 
influenced parents and girls, allowing them to anticipate and strive 
for political office. 

At the time India passed the amendment, nothing was certain. 
No one knew what would happen. Indeed, many feared that man- 
dating exposure to women leaders through reservations would 
completely backfire. People, it was thought, would perceive gender 
quotas as unfair or as violating gender norms, or they might simply 
dislike the restrictions quotas impose on their free exercise of 
choice. While surely some Indian villages must have felt, even 
acted on some of these concerns, the mandate solved a “chicken 
and egg” problem: If people are biased against female leaders and 
never see a woman in a leadership position, they can never update 
their beliefs. With a third of village council seats reserved for 
women, Indian voters were given a chance to evaluate real female 
leaders carrying out their duties in office instead of basing their 
beliefs on stereotypical caricatures. And all Indian women were 
given role models to learn from and be inspired by.* 

Norway is another case in point. In 2003, Norway legislated a 
40 percent minimum representation of each sex on the corporate 
boards of public limited and state-owned companies. While it is 
still early days, Marianne Bertrand and collaborators tentatively 
conclude that the introduction of the board quotas may have trig- 
gered further corporate changes at the top, increasing the share of 
top managers who are women, but did not influence the total 
number of women employed in a given firm. A study for large cor- 


porations in the United States finds similar positive associations 
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between an increase in the number of female directors and moves 
toward more top female executives. Overall, however, the research 
does not suggest that more women on corporate boards and in cor- 
porate suites produced gender de-biasing outside of the firm. Com- 
pared to India’s experiment with village chiefs, the role-model 
effects of Norway's corporate quota seem to be limited to those with 
direct exposure. Stating perhaps the obvious, the Norwegian and 
US evidence underscore that a female director of a corporation is 
much less publicly visible than a female politician.’ 

Senior managers might well be important role models. Based 
on panel data from over 20,000 US firms across all states and in- 
dustries from 1990 to 2003 provided by the Equal Employment 
Opportunity Commission, researchers have found that when the 
share of female top managers increased, subsequently, the share 
of women in mid-level management also rose. The effect was 
most pronounced for white women, but African American, His- 
panic, and Asian women also benefited from same-sex, same-race 
superiors.° 

Do not let the obvious point hidden in these analyses be missed. 
The evidence on role models points to some of the easiest solu- 
tions: role modeling can start with you. Learn from the people 
who have inspired you. Often, this starts at home with parents and 
grandparents, brothers and sisters, nieces and nephews, in-laws, 
and other relatives and friends. Or take a look at Melanne Verveer 
and Kim Azzarelli’s inspiring book Fast Forward, where you will 
learn from more than seventy “trailblazing women.” You will en- 
counter former US secretaries of state Madeleine Albright, Hillary 
Rodham Clinton, and Condoleezza Rice; Managing Director of 
the IMF Christine Lagarde of France; founder of the Self-Employed 
Women’s Association (SEWA) Ela Bhatt of India; the Nobel Peace 
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Prize Laureates Leymah Gbowee and President Ellen Johnson 
Sirleaf of Liberia, Tawakkol Karman of Yemen, and Malala 
Yousafzai of Pakistan; US talk show host Oprah Winfrey and Chi- 
nese talk show host Lan Yang, and change-makers Cherie Booth 
Blair, Tina Brown, Katie Couric, Geena Davis, Abigail Disney, 
America Ferrera, Melinda Gates, Senator Kirsten Gillibrand, 
Arianna Huffington, Katty Kay, Helena Morrissey, Maria Shriver, 
Justice Sonia Sotomayor, Meryl Streep, Aude de Thuin, and so 
many more.’ 

As a director on the board of a large multinational firm, I 
hope my presence has positive effects on the company’s female 
employees. But I also appreciate that it matters how visible the 
director is. The more visible I am, the more likely the positive 
effects. So, whenever possible, I try to meet with women’s groups 
or speak at gender-related events sponsored by the firm. The title 
of a paper on the topic, “Seeing Is Believing... ,” resonates with 
me: people need to see counterstereotypical role models often for 
beliefs to change. In their paper, Nilanjana Dasgupta and Shaki 
Asgari show that exposure to female role models, encountered 
either through biographical information about famous female 
leaders or by seeing female professors in a classroom, decreased 
women’s stereotypical beliefs about themselves. In both a women’s 
college and a coeducational college, the greater the proportion of 
female faculty, the more female students were likely to associate 
women with leadership and with math. 

While a number of studies have shown correlations between 
female students’ achievements and the gender of the instructor, 
particularly in male-dominated fields, only recently have re- 
searchers been able to exploit a naturally occurring experiment to 


test causal links. The Air Force Academy randomly assigns stu- 
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dents to courses, allowing us to measure what impact the faculty’s 
gender had on the students. In introductory STEM courses, fe- 
male students were more likely to choose a STEM major when 
they were assigned to a female professor instead of a male professor. 
The gender of the faculty had no eftect, however, on male students’ 
choices. 

Examining a nationally representative sample of 25,000 middle 
school students in the United States, Thomas Dee of Swarthmore 
College finds that the gender match between students and teachers 
matters dramatically. Among thirteen-year-olds, almost a third of 
the gender gap in reading was eliminated if the English teacher was 
male. The gap was reduced because of a gender match for boys, 
improving boys’ performance, and a gender mismatch for girls, 
hurting their performance in English. Similarly, half of the gender 
gap in performance in science and the complete, albeit much 
smaller, gender gap in math was eliminated if the teacher in 
those subjects was a woman. Given that most teachers are female, 
83 percent at the time of this study, and often comprise about half 
of the workforce in math and sciences, Dee concludes that “the 
gender dynamics between teachers and students at this level amplify 
boys’ large underperformance in reading while attenuating the 
more modest underperformance of girls in math and science.”’ 

The dearth of role models can create self-fulfilling prophecies. 
Two studies in large law firms suggest the low number of female 
partners has a significant impact on women law associates’ careers. 
Kathleen McGinn and Katy Milkman had access to five years of 
personnel data and employee interviews from a US-based law firm. 
They found that retention of junior-level female employees 
was highly correlated with the number of female supervisors. 


The fewer role models (who may also have served as mentors or 
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sponsors) there were, the fewer female associates were promoted 
or stayed in the firm. What is more, Robin Ely’s study of a law 
firm suggests that scarce role models render the few that are vis- 
ible less useful: female associates assessing the lone two female law 
partners with wildly divergent habits confront a challenge that 
male associates assessing dozens of male law partners do not. 

A scarcity of role models can in fact promote gender inequality, 
even as a firm or organization works to address it. Employees were 
more likely to have left the law firm by their fifth year if the 
fraction of same-sex or same-race peers was larger in their work 
group. Apparently, law associates believed that their chances for 
promotion diminished the more people “like them” there were 
at their same level. If ten female associates see only one female 
partner, they could well make the inference that their chances of 
advancement to partner are quite limited. Women and minorities 
may assume that they are in a fierce competition with members of 
their own demographic group for implicitly reserved seats. 

Indeed, a very similar dynamic has been uncovered by a 
number of other studies. Based on an examination of job inter- 
views in a large professional services firm, being interviewed by 
a woman hurt female applicants who the interviewer perceived as 
most likely to turn into competitors, namely the most competent 
women, and benefited the less able female applicants. 

In Spain, academic promotions from assistant professor to as- 
sociate professor and then to full professor are determined by ran- 
domly created evaluation committees, leading to random variation 
in the gender composition of the committees. It turns out that fe- 
male associate professors evaluating assistant professors for pro- 
motion were more likely to be in favor of promoting male junior 


colleagues. But this effect was only evident when evaluator and 
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candidate were at the same institution. It appears as if the evalu- 
ator again feared same-sex competition, perhaps assuming some 
implicit gender quota. In fact, ifa male evaluator was replaced by 
a female evaluator of the same institution, this decreased the like- 
lihood that a junior female candidate was promoted by 38 percent. 
It did not affect female candidates from other institutions. Nor did 
the researchers observe any of this behavior for promotions to full 
professor. When full professors evaluate associate professors to join 
their ranks, they need not fear same-sex competition. Full pro- 
fessor is the highest rank possible in academia, and at that point, 
evaluators, looking for friends and people like themselves, exhib- 
ited an in-group preference, favoring candidates of the same sex 
and academic network. 

Both biases, in-group preferences and in-group discrimination, 
obviously are bad for an organization hoping to hire and promote 
the best talent. The evidence increasingly suggests that in-group 
discrimination is driven by a fear of same-sex competition, or 
the concern that there are only a fixed number of positions 
available for women, and a woman promoting another highly 
capable woman has only made the competition for one of those 
positions more fierce. In contrast, a dislike of in-group members 
appears to turn into a preference for them once people have climbed 
up the career ladder. To what degree this is caused by the scarcity 
of role models, and consequently the scarcity of perceived slots, is 
an open question. Maybe male nurses or teachers would also feel 
more threatened by same-sex competition? Unfortunately, no 
research on this question has been conducted yet.’ 

Role models are everywhere, of course, starting with parents. 
Mothers are often the first role models for young girls. Based on 


data from an eighteen-year panel study, researchers concluded that 
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the egalitarianism of people’s views is related to whether or not their 
mother worked outside of the home. Adolescents seem to be par- 
ticularly influenced by their mom’s employment status. Interest- 
ingly, women’s willingness to work can also be traced back to the 
proportion of men with working mothers. Similar to the Indian 
evidence, exposure to working women, in this case their own 
mothers, seems to have made men more accepting of women’s par- 
ticipation in the labor force. 

Several researchers built their analyses on the efforts during 
World War II to recruit women to the labor force. Mobilizing men 
into the armed services during the war affected more than just the 
generation of women who were consequently pulled into the labor 
force. Its effects were longer lasting. 

The circumstances arising during World War II were unpre- 
cedented. Labor demand alone would have pulled some women 
into the workforce. But as George Akerlof, a 2001 Nobel Lau- 
reate in Economics, and Rachel Kranton argue, role models sup- 
ported by an extensive marketing campaign were necessary to help 
women picture how they could take on a “man’s job” without 
losing their “femininity.” Rosie the Riveter was born and has since 
become a cultural icon in the United States. Rosie’s British equiv- 
alent is Ruby Loftus, whose picture now is part of the Imperial 
War Museums art collection. Little if at all considered at the time 
was how these working women would influence, as role models, 
younger generations of men and women who watched them head 
off to work."” 

Another fascinating and understudied issue is the influence 
daughters have on their parents. Justice Harry Blackmun, a Re- 
publican who served on the US Supreme Court from 1970 until 


1994, is best known for writing the court’s opinion on Roe v. Wade, 
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the landmark decision decriminalizing abortion in 1973. He would 
become one of the most liberal justices in the court’s history, 
writing in Stanton v. Stanton, which in 1975 ruled that a state could 
not specify different ages of adulthood for men and women: “A 
child, male or female, is still a child... No longer is the female 
destined solely for the home and the rearing of the family, and only 
the male for the marketplace and the world of ideas . . . Ifa speci- 
fied age of minority is required for the boy in order to assure him 
parental support while he attains his education and training, so, 
too, is it for the girl.” 

Justice Blackmun was close to his family, including his daughter 
Sally. When his archives became public in the winter of 2004, Sally 
talked with Women’s eNews about her father, and shared some of 
her own story. She was nineteen and a student at Skidmore Col- 
lege when she found out in 1966 that she was pregnant. “It was 
one of those things I was not at all proud of, that I was not at all 
pleased with myself about. It was a big disappointment to my 
parents .. . I did what so many young women of my era did. I quit 
college and married my 20-year-old college boyfriend. It was a 
decision that I might have made differently, had Roe v. Wade been 
around.” 

A few weeks after the wedding, Sally had a miscarriage. But 
it was too late to return to college. Instead, she joined her hus- 
band, then living in a different state. They got divorced six years 
later and Sally eventually completed college, became a corpo- 
rate lawyer, remarried, and had two daughters. She describes 
how, nine years after her unexpected pregnancy, her dad sought 
his family’s input in Roe v. Wade: “Roe was a case that Dad strug- 
gled with. It was a case that he asked his daughters’ and wife’s 


opinion about.” 
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Might his daughter’s experience have influenced him? Maybe. 
Using data from almost 1,000 gender-related cases of the US 
Courts of Appeals and the information as to the makeup of the 
families of 224 judges sitting on the court, Adam Glynn and my 
colleague Maya Sen found that judges with daughters are more 
likely to support women’s causes than judges with sons only (con- 
trolling for the total number of children). It appears that Repub- 
lican judges, much like Justice Blackmun, are more affected by the 
gender of their children than their Democratic peers: their opin- 
ions demonstrate a more marked shift in support of women’s causes 
when they are the parents of girls." 

Glynn and Sen’s work was inspired by Ebonya Washington of 
Yale University, who took a closer look at politics to examine 
whether decisions by male members of the US Congress were af- 
fected by the gender of their children. Indeed, holding the total 
number of children constant, each daughter significantly increases 
a congressman’s likelihood of voting liberally, particularly in re- 
gard to issues involving reproductive rights. In addition to im- 
pacting legal and political decisions, daughters also play a role in 
business. Evidence from Denmark suggests that male CEOs 
who have daughters, and in particular firstborn daughters, are 
associated with a difference in female employees’ wages. In this 
study, the more daughters a Danish CEO has, the better his em- 
ployees are paid.” 

Enlisting men as agents for change is the goal of a number of 
innovative initiatives, including the UN’s HeForShe campaign, a 
“solidarity movement for gender equality that brings together one 
half of humanity in support of the other half of humanity, for the 


benefit of all,” represented by the actor Emma Watson. Australia’s 
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Male Champions of Change “use their individual and collective 
leadership to elevate gender equality as an issue of national and 
international social and economic importance.” The organization 
was founded by Elizabeth Broderick, Australian sex discrimina- 
tion commissioner from 2007 to 2015. Many other organizations, 
too numerous to list here, honor women as role models." 

“Seeing is believing,’ whether to inspire women in India to 
run for public office and men to vote for them, or to help male 
politicians walk in women’s shoes in the United States. The evi- 
dence is overwhelming that role models influence behavior, 
whether daughters affecting fathers, female village council chiefs 
impacting male Indian voters, or female corporate board mem- 
bers influencing the gender makeup of top management. The 
stunning example of India’s amendment reinforces the promise of 
quotas. Even as we unpack the concerns quotas still legitimately 
raise (to be discussed in the next chapter), we can acknowledge 
that they offer a potentially influential tool. And while increasing 
the number of female partners in law firms has proven and will 
continue to prove difficult, every woman in a position of promi- 
nence may choose to act as a role model. At a minimum, if you 
care about gender equality or are a woman weighing job offers or 
candidates to vote for, learning how many daughters a CEO or can- 
didate has is not a bad idea. 

This is why leaders across Australia received a present in 
2014—“Daughter Water.” Labeled water bottles were distributed 
with this explanation: “Women being paid fairly shouldn't hinge on 
a CEO having a daughter, and it doesn’t need to. The Workplace 
Gender Equality Agency has all the tools employers and employees 
need to make sure gender bias plays no part in pay decisions.” 


cm 


Daughter Water, created by the Australian Workplace Gender Equality 
Agency in partnership with DDB 
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Designing Gender Equality—Create Role Models 


Diversify the portraits on the walls of your organizations. 
Increase the fraction of counterstereotypical people in 
positions of leadership, through quotas or other means. 
Seeing is believing. 

Know that fathers with daughters are more likely to care 
about gender equality. 


11 


Crafting Groups 


Around the world, microfinance institutions provide credit to the 
poor. About three-quarters of their customers are female: why? For 
starters, being among the poorest of the poor, women’s need of 
credit is acute. But creditors seek female customers for a more 
pragmatic reason: they are more likely than men to repay their 
loans. Micro-credit groups are not alone in marketing their ser- 
vices to women. Rotating Savings and Credit Associations, or 
ROSCAs, are groups in which members make regular contribu- 
tions to a fund that is then distributed among the participants, 
overwhelmingly consisting of women. In Kenya, for example, 
public goods such as schools are often provided by women through 
harambee, the Swahili expression for “let’s pull together.” 

When thinking about diversity, we must understand why, 
sometimes, people are drawn to homogeneity. Why do women- 
only groups appear to be particularly adept at loan repayment, 
cooperation, and contributions to public goods? Seeking answers, 
Fiona Greig traveled to a slum in Nairobi, Kenya, and ran a 


number of experiments. The female slum dwellers were indeed 
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more likely to cooperate with other women than with men because 
they expected women to be more cooperative. It turned out that 
they were too pessimistic about their male counterparts, who the 
experiment showed would have been more cooperative if given 
the chance. But expectations became self-fulfilling prophecies: 
women were more likely to trust other women, and were rewarded 
in return for such cooperation.! 

What works in a Nairobi slum also works on an American 
game show. Friend or Foe, a variant of the classic prisoner’s dilemma 
studied extensively by game theorists and behavioral scientists, 
started airing in the United States in 2002. Two players have 
to simultaneously decide whether to play “Friend” or whether to 
play “Foe.” If both of them play “Friend,” they share a pot of 
money. If both play “Foe,” they both get nothing. And if one plays 
“Friend” and the other plays “Foe,” the “Foe” player gets the whole 
pot of money and the “Friend” player gets nothing. Thus, people 
are motivated to play “Foe” to maximize their earnings, but must 
fear that the other person will do the same, leaving both empty 
handed. But cooperation is risky, of course. If you choose “Friend” 
and your counterpart chooses “Foe,” you will have been played for 
a sucker. The stakes varied between $200 and $16,400, and across 
315 games, 630 contestants made over $700,000. 

About half of the participants typically played “Friend.” Spe- 
cifically, they cooperated if they had reason to believe that their 
counterpart would cooperate as well. But how could they know? 
What clues did they look for? Contestants met briefly, and they 
had to answer trivia questions together, but otherwise they knew 
very little about each other. They could, however, watch earlier 
episodes and learn what typical cooperators looked like. What 


they would have learned is that they were more likely to be female. 
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Accordingly, pairs of women were most likely to expect each 
other to cooperate, and many fulfilled these expectations.” 

What works like a charm in this version of a prisoner’s dilemma 
is not necessarily helpful in other contexts. In their review of the 
experimental evidence on the role of gender and cooperativeness, 
Rachel Croson and Uri Gneezy stress that the results seem to 
depend on social cues about appropriate behavior that women in 
particular take from their environments. Such cues help them 
form expectations. But, of course, sometimes expectations clash. 
Consider the following. Imagine that you and I are negotiating 
over who gets what amount of a fixed sum of money. Because I am 
a woman, you might expect more from me—and because you 
are a woman, I might expect more from you. If those expectations 
go unmet, we could be at an impasse. This is the exact pattern 
found in an experiment where one of two players, the proposer, 
was given a pot of money and had to decide how much of it to 
offer to a second person, the responder. In the traditional version 
of this game, the ultimatum game, the responder then decides 
whether or not to accept the offer. If he or she accepts, the deal 
stays as proposed. If he or she rejects, neither of the two players 
gets anything. In this particular experiment, the responders had 
to write down the minimum amount they would accept before 
they saw what they were offered. 

It turns out both male and female responders expected more 
from women. They set higher minimums when their proposer was 
a woman than when he was a man. But expectations clashed. Both 
male and female proposers offered less to women than to men. In 
fact, the clash was worst for female-female pairs: Female proposers 
expected kinder, more generous female responders, and on average 


expected them to accept a 43 percent share of the pie. In com- 
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parison, women offered men 51 percent of the pie. But the same 
was true for responders: female responders expected kinder, more 
generous female proposers and on average, demanded a 42 percent 
share of the pie. When paired with men, they demanded only 
28 percent. 

In this experiment, female-female dyads stood out, with female 
pairs clashing 23 percent of the time with the result that both 
participants ended up with nothing. By contrast, all other gender 
pairings hardly clashed at all. Men expected more from women—and 
they got more. Women expected less from men—and they got 
slightly less, but were content with it. Male-male pairs worked 
out because men expected the least from male proposers—and 
were treated better than expected. Homogeneity helps in some 
contexts but not in others—and some of these contexts are quite 
controversial. For example, consider the hotly debated question 
of sex-segregated education.’ 

In what is widely considered the most significant policy change 
since the prohibition of sex discrimination in educational institu- 
tions, Title IX, which passed in 1972, the George W. Bush ad- 
ministration in 2006 relaxed constraints on single-sex education 
in public schools. It ruled that districts could create single-sex 
classes and schools as long as the district offered coeducational 
schooling of equivalent quality and students and their parents 
could choose which form of education they wanted. Though 
heavily criticized by some civil rights and women’s groups, who 
feared it would legitimize discrimination in schools, the ruling 
had sufficient backing from Republicans and Democrats and 
passed. Subsequently, the number of public schools catering to 
boys or girls exclusively rose substantially. According to some esti- 


mates, in 2015 there were more than 1,000 public single-sex schools 
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in the United States, though compared to the country’s nearly 
100,000 public schools, they remain a small fraction.* 

In addition, and by statistical necessity, if single-sex schools 
are provided primarily for one gender, in this case girls, they will 
increase the gender imbalance in other schools, leading to an 
overproportional share of boys. This matters. Caroline Hoxby of 
Stanford estimated the effect of the gender composition of the 
classroom on academic achievement in Texas. Both girls and boys 
did better when there were more female students in the classroom, 
including in math, where on average girls had lower test scores 
than boys. This is not just a spillover effect, with better students 
benefiting the less skilled students. This is a girl effect—a pattern 
that has also been found in Israeli and Spanish schools. 

The effect varies somewhat by grade, subject, and location, 
but overall the studies estimating peer effects in classrooms sug- 
gest that while girls might well benefit from single-sex class- 
rooms, they hurt boys. Both genders, but particularly boys, ben- 
efit from being surrounded by girls. Although explaining why 
requires further research, the Israeli study shows that a larger 
share of female students led to more satisfied students, less ex- 
hausted teachers, less classroom disruption and violence, as well as 
better relationships between teachers and students and between 
students.° 

How groups are composed matters. Too much data has been 
collected for anyone to form them thoughtlessly. How you design 
work teams, classrooms, and corporate boards should at a min- 
imum be accomplished with an awareness of potential gender dy- 
namics. And gender is present always, whether the group consists 
of a single sex or all one sex but for a lone member, or a neat (and 


rare) equally split group. 
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Knowing about group dynamics, you can also create groups 
strategically to meet a particular objective. Consider the work on 
peer effects that supports the notion that exposure to the “out- 
group” can improve cross-group relations. A group of researchers 
looked at what happened when students with different racial back- 
grounds were randomly assigned to live together at the beginning 
of their college year. White students who were randomly assigned 
to live with African-American students were more likely to sup- 
port affirmative action and endorse diversity on campus. They also 
reported that they were more comfortable interacting with people 
of color and were more likely to do so. (Due to the small sample 
size, the authors could not draw inferences on how racial diver- 
sity impacted people of color.) And the random assignment of 
roommates at Dartmouth College has been shown to influence 
substantially academic efforts and social behaviors. Group com- 
position matters. 

Looking beyond the dorm room, where incentives for getting 
along are great, experiments suggest that size also matters. Based 
on datasets about friendships and social interactions at the class- 
room and the school level, researchers conclude that adolescents 
in small schools have a more diverse set of friends. In bigger schools, 
students have a larger range of potential friends to choose from 
and opt to cluster by sex, race, age, and socioeconomic status, 
leading to segregation and cliques. In short, we like being sur- 
rounded by similar others. But if people do not have much choice, 
social category—based differences lose their relevance. And contact 
across groups matters. A meta-analysis of more than 500 studies 
and over 700 independent samples shows that contact typically 
reduces intergroup prejudice, supporting what in psychology is 


known as the intergroup contact theory.° 
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While we are not inclined to seek diversity unless we have to, 
there would be lots of advantages if we did. A large number of labo- 
ratory studies suggest that diversity can increase productivity. A 
particularly comprehensive study assigned about 700 people to 
groups of two to five and had them participate in a variety of tasks, 
including brainstorming, solving visual puzzles, making moral 
judgments, conducting negotiations, and playing a collective game 
of checkers against a computer that took up to five hours. The 
researchers, Anita Williams Woolley of Carnegie Mellon and col- 
laborators, used a team’s performance on these tasks to calculate its 
“collective intelligence” factor, which by assessing how well a 
group does on a set of tasks can help predict the group’s future 
performance on different tasks. 

It turns out that individual team members’ average or max- 
imum intelligence is a bad predictor of a group’s collective intel- 
ligence. Instead, the higher a group scored on social sensitivity, 
the more opportunities to speak were equally distributed among 
members, and the larger the share of women in the group, the 
higher its social intelligence. Or, as Woolley and her colleagues 
put it, “collective intelligence of the group as a whole has predic- 
tive power above and beyond what can be explained by knowing 
the abilities of the individual group members.” The importance 
of including women in a group so that it could reach its potential 
came as a surprise to the authors of the study. It appears as if it was 
partly due to female team members scoring higher on the social 
sensitivity measure than male team members, thus providing the 
necessary glue to connect all members’ contributions and create a 
whole that exceeded the sum of its parts. 

Such results are tantalizing, suggesting that further research 


might be able to turn collective intelligence into a diagnostic tool, 
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allowing us to predict which teams will perform well and which 
will struggle. In addition, tools might be developed to help teams 
manage themselves—ensuring a more equitable share of speaking 
times, for example—producing the social sensitivity to allow a 
group to benefit from everyone’s contributions.’ 

Groups can also rely on decision rules that make sure every 
opinion counts. Decision rules could stipulate that a team must 
reach a decision by consensus or backed only by a majority of the 
team. Comparing unanimity with majority rules, experimental 
research by Christopher Karpowitz, Tali Mendelberg, and Lee 
Shaker shows that unanimity leads not only to every vote counting 
but also to a broader sharing of “voice,” with more people par- 
ticipating in the deliberations. Unanimity rules appear to be 
particularly important for groups in which women are in the mi- 
nority. The researchers conclude by arguing that “deliberative de- 
sign can avoid inequality by fitting institutional procedure to the 
social context of the situation.” 

In addition, and somewhat counterintuitively, groups might 
want to constrain themselves from speaking freely. The study 
“Creativity from Constraint” presents experimental evidence on 
how imposing a norm of political correctness (PC) that specifies 
how men and women should interact with each other enhances 
creativity in mixed-sex groups. The PC norm increased the ex- 
change of ideas by clarifying the rules of engagement and pro- 
viding assurance to those, predominantly women, for whom 
speaking up was associated with counterstereotypical behavior.* 

Although randomly formed groups are easily created in the 
laboratory, when doing research in organizations, we typically 
have to work with existing teams. But while selection issues are 


real, organizational settings have other advantages. Typically, the 
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time frame of an organization’s team is longer and the stakes 
higher than one created in the laboratory. Some of the dynamics 
seen in the laboratory emerge in the field, too. In one study ex- 
amining team performance and employee satisfaction in a global 
professional services firm, more gender-diverse offices generated 
more revenue. The same study also found that employees in more 
diverse teams were less comfortable and less willing to cooperate, 
though the latter effect was mitigated when employees believed 
that the firm endorsed diversity.’ 

In a similar finding, gender diversity in senior management 
teams has been found to be associated with better firm perfor- 
mance under some conditions but not under others. As always in 
correlational analyses, causality could go in either direction: high- 
performing firms might attract an over-proportional share of 
women, for example. To overcome this shortcoming, one study 
examined start-up firms where the sequence of events can be better 
controlled. Using a large Austrian database, the study’s authors 
looked at whether there was a relationship between women being 
among a start-up’s first highest-paid hires and the firm’s survival. 
Controlling for a large number of additional variables, they found 
that firms with at least one woman among the first hires were more 
successful and stayed longer in the market than all-male start-ups. 
Similar positive impacts of gender diversity on firm performance— 
measured by sales, profits, and earnings per share—were found for 
randomly created student start-ups in the Netherlands." 

In short, diversity can lead to better performance—but not al- 
ways. When might we reasonably expect its benefits? Suppose 
that you need to create a new team to work on a special project 
and have five positions to fill. You evaluate the applicants based 


on ten different characteristics and assign scores from one (worst) 
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to ten (best) for each characteristic. It turns out that exactly five 
of the applicants have a total score of 91, with the best possible 
score on nine of the attributes that you are looking at and the worst 
possible score on one of them. These are your highest total scorers 
and you might consider yourself lucky to get all five on your team. 
Or should you? While the five are the highest total scorers, they 
may not provide you with the diversity you need. None of the five 
scored high on one of the attributes you care about, say creativity. 
Shouldn’t you choose at least one or possibly even more than one 
candidate with a lower total score but high marks on creativity? 

One study, aptly entitled “Too Many Cooks Spoil the Broth: 
How High-Status Individuals Decrease Group Effectiveness,” 
looked at equity research analysts on Wall Street and found that 
groups enjoyed ever-smaller benefits from including more and 
more star performers. As the share of stars increased, the benefit 
each additional star brought the team declined and eventually even 
became negative. 

When we build teams, we look for complements, not substi- 
tutes. The diversity of viewpoints may trump average excellence 
when we have to solve problems collectively. In his wonderful 
book The Difference, Scott Page helps us understand why a suc- 
cessful team does not necessarily consist only of star players. Both 
ability and diversity are required for collective intelligence to reach 
its potential. Of course, complements must overlap. To state the 
obvious, an excellent Mandarin speaker without any knowledge 
of English and a superb English speaker without any knowl- 
edge of Mandarin are complements but do not make the whole 
greater than the sum of its parts." 

For gender diversity to increase group performance you need 
team members whose different perspectives add value while 
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keeping the cost of coordination as low as possible. Indeed, diver- 
sity theories in organizational behavior, psychology, sociology, and 
economics expect performance to improve the more diversity is 
based on different sets of competencies relevant to the task. The 
more relevant knowledge increases with each additional team 
member, the more positive the impact. In contrast, the more dif- 
ferences are based on categories not related to the task at hand or 
on deeply held values, the more diversity is expected to hurt team 
performance. Of course, variables can interact in many different 
ways and correcting for most of them is challenging. In certain 
fields, these differences can even prove insurmountable.!” 

Based on data from more than 2,000 management teams from 
several different organizations in the equity mutual fund industry 
in the United States from 1996 to 2003, differences indeed hurt 
team performance: homogenous teams significantly outperformed 
mixed-sex teams. Every team’s job was to manage an equity fund, 
with expectations clearly defined and performance easily measur- 
able based on fund returns. High fund performance was rewarded 
by compensation and promotion. Many teams worked together 
for several years, optimizing team performance and developing 
relationships. 

With only about 10 percent of fund managers being female at 
the time of the study, homogeneity basically meant all-male teams. 
Only 2 percent of the teams consisted of women only. With such 
little variation, the researchers had to measure gender diversity 
with a dummy variable that took the value of one if the teams con- 
sisted of men and women, and zero if they were exclusively male 
or (much more rarely) exclusively female. Furthermore, if a team 


was diverse, in almost all cases female managers constituted a small 
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minority of a team. In this heavily skewed micro- and macro- 
environment, the study showed that heterogeneity did not pay." 

Organizations pondering how to best create teams should focus 
on “critical mass.” A seminal paper that my colleague Rosabeth 
Moss Kanter of Harvard Business School wrote in 1977 suggests 
that one of the key factors determining team performance is the 
relative share of people from different demographic categories rep- 
resented, or a group’s critical mass. Having studied American 
corporations in the 1970s, she observed that the “relative numbers 
of socially and culturally different people in a group” were “crit- 
ical in shaping interaction dynamics in group life.” In groups dom- 
inated by one social type, as in the mutual fund industry, mem- 
bers of the minority group are likely to be treated as tokens among 
their peers. Their minority status makes them visible and easily 
reduced to their demographic characteristics. Regarded as sym- 
bolic representatives of their social category, they may be unable 
to fully contribute their complementary expertise. 

Many of you have likely experienced what it feels like to be 
the obvious outlier in a group, perhaps due to your sex, race, eth- 
nicity, nationality, religion, or sexual or political orientation. The 
Hispanic accountant is very often considered the spokesperson for 
Hispanics rather than the expert in accounting and the Chinese 
professor of computer science teaching in the United States 
becomes the go-to person on all things Chinese. Tokenism of 
this sort is uncomfortable and can easily undermine the group 
member’s credibility. Differences tend to be stressed, sometimes 
compelling the token member to adopt the majority’s style and 
opinions. In addition, tokens may feel the need to overachieve 
to prove their worth. What often is referred to as the queen bee 
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syndrome, describing the lonely woman at the top, can be the re- 
sult. Rather than pave the way for those who follow them, token 
members look up to their majority peers, and assimilate and dis- 
tance themselves from new entrants of their own social cate- 
gory. This seems particularly true for first-generation women in 
counterstereotypical roles. 

In more balanced groups, stereotypes lose their importance 
and minority members are regarded as individuals rather than 
just token representatives. While the exact tipping point from 
scarcity to balance is hard to determine, it appears as if equal 
representation is not required to change experiences and team 
performance. Many argue that a critical mass of one-third in 
relative terms and at least three in absolute numbers is required 
to move groups from being haunted by the dynamics of social 
categorization to being able to seize the benefits of diversity." 

A decade after Rosabeth Moss Kanter’s article, the political sci- 
entist Drude Dahlerup applied critical mass theory to politics. 
Politicians in committees or parliament appear to be affected by 
relative numbers in the same way employees are influenced by the 
sex composition of their work groups. In addition, in politics the 
business case for making sure that we seize the benefits of diver- 
sity might be even more critical. If male and female voters have 
different policy preferences, they may care about the sex of their 
representatives for substantive reasons. Female suffrage, for ex- 
ample, has been shown to lead to an increase in spending on health 
care in the United States, and reserved seats on local governing 
committees for various castes in India resulted in more support of 
these groups.” 

Tokenism, critical mass, and queen bee syndrome present a 
challenge to many organizations that start off with skewed demo- 
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graphics. A boutique eighty-member consulting firm with only 
eight women will have a hard time putting into practice the in- 
sight that homogeneity and balanced heterogeneity both likely 
outperform skewed heterogeneity. When Katherine Phillips and 
Damon Phillips of Columbia University studied the performance 
of NHL hockey teams between 1988 and 1998, they discovered 
this pattern. Focusing on national heterogeneity in teams with 
twenty-eight different nationalities represented, the hockey teams 
won more games when their heterogeneity was either low or high. 
Relatively homogenous groups had few coordination issues and 
extremely heterogeneous ones had no choice but to deal with 
them. With many different nationalities represented on a team, 
fault lines started to blur. If almost every player comes from a dif- 
ferent country, social categorization based on country of origin 
loses its importance. In contrast, if only a couple of nationalities 
are represented on a team, players are attracted to others like them, 
inviting inter-group differentiation and conflict.'® 

When I form teams in my classes, I take these considerations 
into account. My students often express surprise when they find 
themselves in same-sex teams. In the executive program for the 
World Economic Forum’s Young Global Leaders at the Kennedy 
School, such assignments led to outright protest. People wanted 
to be in diverse teams. I had to explain to them that not everyone 
could be on a diverse team without turning some people into 
“token” group members. A rich classroom discussion broke out, 
in which I shared many of the insights of this chapter and distilled 
for them a cheat sheet of group formation: 


e Ifa task involves coordination, say the provision of 


a public good like clean water or better health care, 
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homogenous groups can be helpful. All-women teams, 
for example, outperformed mixed and all-male teams in 
Friend or Foe because they correctly believed that women 
would be more likely to cooperate than men, leading to a 
virtuous cycle. 

e Ifa task involves individual problem-solving, say test- 
taking, be aware of peer effects. Diversity might produce 
spillovers (or, more formally, “externalities”) affecting, for 
example, how students perform in a class. If one group is 
more likely to work hard or disrupt less, as has been 
found to be true of girls, having them over-represented 
can help others, in this case boys, perform better. 

e Ifthe task involves collective problem-solving, say the 
building of a house, you should go for heterogenous 
groups where the individual knowledge and perspectives 
of group members complement each other. A particularly 
useful skill to incorporate in groups is listening and 
bridge-building, both shown to be correlated with the 
fraction of women present. And to reduce the challenges 
associated with social category diversity, you need to 
make sure demographic minorities are represented by a 
critical mass of members. If you start out with a popula- 
tion of, say, 20 percent men and 80 percent women and 
then want to create work teams, do not allocate people 
proportionally. Instead, form a few balanced teams and 


assign the rest of the women to all-female groups. 


Does perceived fairness matter? Students and participants in ex- 
ecutive education programs always express concern with how 


their groups were formed. Was it done randomly, based on merit, 
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or using some intricate algorithm? They care for many different 
reasons. An obvious one is that in some cases their grade depends 
on their team’s performance. But they also look for subtle clues to 
help them better understand their place and purpose in the group. 
Finally, they also care about fairness. Much research suggests that 
people are more likely to accept an unfavorable outcome if they 
believe the process was fair. But what characterizes a fair process? 
Surprisingly, what many consider the epitome of fairness, a random 
process, is not typically what people like best. Bruno Frey of the 
University of Zurich and colleagues have studied this question in 
many different environments, and what they typically find is that 
bureaucratic or traditional procedures such as “first-come first- 
served” win the day. Much to my surprise, my students often 
prefer that I form the teams rather than that they create their own.” 

One much-debated bureaucratic mechanism we mentioned in 
Chapter 10 for composing teams is quotas. Guaranteed political 
seats in Indian villages and corporate board quotas in Norwegian 
firms have created role models. At the same time, a laboratory 
study that Johanna Mollerstrom conducted in Boston found that 
people did not perceive quotas as fair. When team membership 
was decided by quotas as compared to a random process, it turned 
out that people were less willing to cooperate with one another. 
Similar evidence stems from an experiment conducted in Australia 
where study participants in the quota treatment went as far as to 
sabotage each other. In addition, quotas do not always achieve their 
purpose. For example, though Spain passed a law in 2007 man- 
dating that at least 40 percent of each sex be represented in 
national parliamentary elections, actual numbers have fallen short. 
And the women who were included were not always helped. 


One study shows that parties positioned their female candidates 
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disadvantageously. For example, in a Senate election women 
were placed by their party in only 20 percent of the winnable 
seats whereas 53 percent of the slots that were expected to be lost 
were assigned to women. Similar issues have been raised in France 
where an equal representation of men and women on candidate 
lists was mandated in 2000." 

Much has been written about the advantages and disadvantages 
of quotas. Predictions about their overall performance tend to 
depend on the theory of the world people have. If you think that 
there is a “pipeline” problem, that there are too few qualified 
women for a given job, or that such mandates undermine the 
functioning of a team, you will expect quotas to decrease perfor- 
mance. On the other hand, if you believe that stereotypes keep 
qualified women from being selected, you will be optimistic about 
the impact of quotas. Which theory wins the day often depends 
on context. For example, pipeline issues are real in some fields, as 
a survey of the research on women’s underrepresentation in STEM 
fields suggests. Only about 20 percent of women graduate with a 
bachelor’s degree or a doctorate from engineering schools in the 
United States. Arguably, there are just not enough female engi- 
neers in the pipeline to dictate a quota in some circumstances— 
for example, a 40 percent quota for federally financed projects. 

However, inferring from this that quotas do not make sense 
more generally would be grossly misleading. In an elegant experi- 
ment that Muriel Niederle, Carmit Segal, and Lise Vesterlund 
conducted in the United States and which has now been replicated 
in other parts of the world, quotas were shown to induce more 
talented women to compete, those who should have competed all 
along but held back due to a lack of self-confidence and self- 
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stereotyping. In their study, there were enough women in the 
pipeline; they just did not dare to put themselves forward." 

Concerns about adequate numbers of sufficiently skilled women 
and people of color in the pipeline are repeatedly raised in the 
context of affirmative action policies. There is reason to doubt 
the severity of such worries. In the United States, such policies 
have played an important role for federal contractors and, despite 
many people’s fears that affirmative action would negatively impact 
firms’ performance, a review of the evidence suggests no such ef- 
fects. There were enough qualified job candidates who had been 
formerly discriminated against to fill the open slots, and the firms 
were able to find and hire them. Examining how, researchers sur- 
veyed firms in four large US cities: Atlanta, Boston, Detroit, and 
Los Angeles. The firms reported that they had broadened their 
searches, looking in places that they had not looked before, and 
looking for candidates that were not the usual suspects. The result 
was a more diverse candidate pool to choose from. And it worked: 
comparing federal contractors with nonfederal contractors, the 
proportion of women employed rose substantially faster in some 
periods when firms were affected by the policy. According to 
the most comprehensive study to date by Fidan Ana Kurtulus of the 
University of Massachusetts, Amherst, analyzing data over the 
course of three decades between 1973 and 2003, however, the 
primary beneficiaries of affirmative action were African Amer- 
ican and Native American women and men.”” 

The pipeline concern is stubbornly persistent, however, 
and certainly having firms lower their standards to comply with 
a quota or affirmative action is in nobody’s interest. Among 
other things, it can further bolster negative stereotypes about the 


238 HOW TO DESIGN DIVERSITY 


discriminated-against group. Indeed, studies suggest that em- 
ployees hired under gender- or race-based policies experience stig- 
matization, both by others as well as by themselves, but that such 
negative stereotyping can be attenuated if merit-based criteria 
played a dominant role in the hiring decision. A two-stage pro- 
cess, thus, seems imaginable where first merit is determined and 
then members of certain demographic groups are preferentially 
treated.” 

More important, perhaps, is that the pendulum appears to be 
swinging in the direction of the second quota theory: they boost 
the participation of well-qualified but previously underrepresented 
individuals. Today, more than half the countries in the world have 
adopted political quotas. They range from party quotas—a cer- 
tain representation of female candidates on party lists, whether vol- 
untarily adopted or mandated by law—to reserved seats for a 
fraction of women who must be represented in elected office. Some 
of this reflects the realization that the self-perpetuating effects of 
discrimination can only be broken if opportunities for the tradi- 
tionally discriminated-against are created. If people assume women 
are unsuited to leadership, women invest less in leadership training 
and seek out fewer leadership opportunities. And when they do 
seek to become leaders and confront the stereotype, they are less 
likely to be chosen. Quotas can short-circuit this cycle. Far from 
elevating the under-qualified, quotas prove in fact to broaden the 
pool of qualified candidates. 

This is perhaps why quotas have started to spread in the 
business world. In 2003 Norwegian legislation mandated that 
40 percent of each sex be represented on its corporate boards. This 
was followed by similar laws in Belgium, France, Germany, Ice- 
land, Italy, the Netherlands, and Spain. Board quotas and related 
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target-based schemes are currently under discussion in various 
other places, including Brazil, Canada, the Philippines, Scotland, 
South Africa, the United Arab Emirates, and the European 
Union. In Germany, Chancellor Angela Merkel surprised many 
with a change of heart when in 2014 she affirmed the plan to in- 
troduce board quotas of 30 percent for the largest German com- 
panies in the Bundestag: “We can’t afford to do without the skills 
of women,” she said.?? 

But was Chancellor Merkel right? Does the evidence suggest 
that a larger share of women on its corporate board is good for a 
company? The short answer is that based on the available data, it is 
almost impossible to prove either way. No study to date has been 
able to establish a causal relationship between corporate board di- 
versity and company performance. Boards are not created ran- 
domly nor imposed on firms randomly. If there was a relationship 
between board diversity and firm performance, we would not 
know whether it was the board that affected the company or 
whether the company influenced the composition of the board. 

Even though causality cannot be established with the available 
data, much research has gone into understanding whether there is 
a relationship between board diversity and company performance. 
Deborah Rhode and Amanda Packel of Stanford University pro- 
vide an excellent review. The evidence is mixed. A number of 
studies report positive correlations between the fraction of female 
directors and company performance. Taking a global perspective, 
the Credit Suisse Research Institute, for example, found a substan- 
tial gender diversity premium in an analysis of more than 2,300 
companies across the globe from 2005 to 2011—but only after the 
financial crisis in 2008. Maybe diversity is particularly relevant 


in turbulent times? Miriam Schwartz-Ziv’s analysis of Israeli 
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companies suggests that critical mass mattered and that companies 
with at least three female directors had higher ROEs (returns on 
equity) and net profit margins. Others have found no or nega- 
tive correlations between gender diversity and performance. 

Given the mixed evidence of individual studies, a meta-analysis 
combining the results of 140 studies is particularly helpful in this 
context. Across all studies, it finds a small positive relationship be- 
tween female board representation and company profitability (ac- 
counting returns). Market performance, on the other hand, was 
only positively related to board diversity in countries with greater 
gender parity (as measured by the World Economic Forums Global 
Gender Gap score) and negatively otherwise. Investors’ evaluations 
of a firm’s future performance may well be influenced by gender 
norms prevalent in a given country. In more gender-equal coun- 
tries, they expected gender diversity on corporate boards to 
be a good thing; in less gender-equal countries, they saw it as a 
disadvantage. 

To put these findings into perspective, recall that generally 
there is little empirical evidence that any of the board character- 
istics we typically worry about, including board size, number of 
independent directors, time and effort spent by directors, director 
indemnification, director duties, or whether or not the CEO serves 
as board chair are related to firm performance.” 

In addition, the above studies do not distinguish how diver- 
sity was brought about. Understandably, as gender quotas on cor- 
porate boards have only been introduced very recently, we know 
very little about their impact on company performance. The little 
we do know is based on Norwegian companies. There the evi- 
dence suggests that the introduction of quotas had negative short- 


term impacts, both on profits and on company valuation. One 
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study compares the profits of Norwegian companies after the in- 
troduction of quotas with how well other Scandinavian compa- 
nies did in the same time frame. It finds that the introduction of 
quotas led to an increase in spending on labor, driven by employ- 
ment levels, not compensation packages. Survey evidence supports 
the notion that female directors may indeed be more concerned 
with employees than their male counterparts. Could this have hurt 
the performance of companies affected by the quotas in Norway 
at a time when many other companies laid off people due to the 
financial crisis in 2008? 

We will never quite know what the channels of influence were 
as the research does not allow us to recreate what happened in 
these companies. There are too many possible variables: perhaps 
the female directors were indeed decisive, or their presence influ- 
enced opinions of male directors, or management reacted nega- 
tively to the gender quotas or the increased diversity on their 
boards. It is also possible that all this had little to do with the spe- 
cific composition of the boards and instead was the result of teams 
being newly formed. Teams, and in particular heterogenous teams, 
do become more effective over time. Richard Hackman, for ex- 
ample, reports that how long an airline crew has been flying 
together is a good predictor of aircraft “incidents:” 73 percent of 
incidents occur when a crew flies together for the first time.”* 

To be clear, quotas are not behavioral interventions, but they 
affect people through behavioral channels. Whether or not they 
should be introduced 1s a political decision, weighing their ben- 
efits and their costs. Their beauty is that they change numbers 
quickly, sparing the team the stereotyping and painful assimila- 
tion processes that go along with a more incremental approach, 


which can depress performance. Yet a two-stage process where 
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candidates are first reviewed for merit, ideally in blind evalua- 
tions, seems advisable in order to address fairness considerations. 

The composition of groups clearly matters. And while getting 
it right is not easy, some behavioral design principles can help us 
move in the right direction. The most important one is critical 
mass. When forming diverse teams, make sure every subgroup is 
represented by at least three people or makes up about a third of 
the total. When you next appraise the performance of your team 
members, take a moment to reflect on this. Women have been 
found to receive lower-quality performance evaluations than men 
in work groups where they compose less than 20 percent of the 
group. As their relative presence increased, so did the scores on 
their performance evaluations. Shortlists and the much-celebrated 
diverse slates are another place to look for potential improvements. 
Some executive search firms have pledged to always include a 
woman on their shortlists—but adding only one might not do the 
trick and could even backfire. If you cannot include more than 
one woman, keep groups homogenous. Creating token members 
is in nobody’s interest.” 

Finally, when you design a team, adhere to some basic prin- 
ciples that have been shown to enhance performance. Teams 
should not be too large. Several studies suggest that ideal teams 
consist of four to six members with high cognitive and low value 
diversity. And more important, do not fall prey to the problem of 
“groupthink.” Not only do team members tend to favor informa- 
tion supporting their initial views, but confirming evidence also 
makes them more confident—without increasing accuracy. Make 
it a point to seek challenging information and create inclusive 
processes that help you benefit from the diversity represented in 


2 
your team.7° 
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Designing Gender Equality—Create the Conditions 
for Collective Intelligence 


Combine average ability with complementary diversity of 
perspectives and expertise to maximize team 
performance. 

Include a critical mass of each subgroup in teams to avoid 
tokenism. 

Create inclusive group processes to allow for diverse 
perspectives to be contributed and heard, for example, by 
introducing unanimity rules or political correctness 


norms. 


12 


Shaping Norms 


In 2011, the UK Behavioral Insights Team sent more than 100,000 
letters to British citizens reminding them that they had not yet paid 
their taxes. Often dubbed the “Nudge Unit,” the Behavioral In- 
sights Team was created in 2010 at No. 10 Downing Street to use 
behavioral insights to improve how government works. The let- 
ters sent were identical but for one short paragraph, the difference 
sometimes consisting of one sentence only. A first group of recipi- 
ents read that “nine out of ten people pay their taxes on time.” A 
second group received a slightly amended version of this sentence: 
“Nine out of ten people in the UK pay their taxes on time.” And 
a third group saw the following two sentences: “Nine out of ten 
people in the UK pay their taxes on time. You are currently in the 
very small minority of people who have not paid us yet.” Others 
either received no additional message or an altogether different 
one that reminded them that paying taxes was important for the 
provision of public services. The third letter was the winner: it gen- 
erated tax revenue of more than $3 million in less than a month. 


The team was intrigued and ran more experiments, all con- 
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firming that messages that have the most impact tell readers about 
what others do and point out that you are the outlier. 

Sharing such information establishes social norms. Generally, 
people want to run with the herd. Dozens of field experiments, 
similar to the one described above, establish this fact. Contrary to 
traditional economic theory, people do not vote when nobody else 
is voting. Think about this. In a rational world, we would expect 
people to be especially interested in voting as the odds increase 
that their vote would prove decisive. But this is not how it works. 
People copy others. They are more likely to vote when voter 
turnout is high and stay home when it is not. Similarly, people are 
more likely to lower their energy use, donate money in support 
of public radio, or recycle when informed that most people are 
also doing these things. People are “conditionally cooperative” 
and are more likely to contribute to public goods when others do 
so as well.! 

People are even more likely to accept a job offer when informed 
that most others have done so in the past. In a field experiment 
involving thousands of recent college graduates who were accepted 
to teach in an underperforming school in the Teach for America 
program, the admitted were more likely to say yes when the offer 
letter mentioned the high fraction of people who had accepted the 
offer in the previous year. They were also more likely to follow 
through on their acceptance and still be working in the job half a 
year later. 

These studies suggest that we can turn descriptive norms, what 
many people are already doing, into prescriptive norms just by 
telling people about them. What is becomes what should be. People 
are generally more likely to adopt a behavior if they know that 


most others are already doing it. We sometimes refer to this as 


246 HOW TO DESIGN DIVERSITY 


herding behavior: the behavior of others, the herd, informs us as to 
what is normal, appropriate, or beneficial to do. And we then do 
likewise. 

An early powerful demonstration of this was run in a parking 
lot. The researchers put flyers on people’s windshields as well as 
on the ground. They were interested in learning what fraction of 
the people finding a flyer on their car windows would toss it on 
the floor as opposed to either keeping it or disposing of it properly. 
As you would expect, people were much more likely to throw 
their flyer on the ground if the parking lot was already littered. 
Robert Cialdini, one of the authors of this study, went on to ex- 
plore the impact of such norms in many different contexts. One 
of the more troubling findings is that people also do more bad 
things if others do them as well. In one study, he and collabora- 
tors wished to find out how to discourage visitors to Arizona’s 
Petrified Forest National Park from removing the petrified wood. 
They found that posting signs emphasizing that many others had 
stolen some made people more likely to do the same.” 

In Spring 2013, I shared these insights with the UK Depart- 
ment for Business, Innovation, and Skills, then led by Secretary 
Vince Cable. The team I met with worked on increasing gender 
diversity on corporate boards in the United Kingdom without re- 
lance on quotas. They were inspired by the work of the Behav- 
ioral Insights Team and eager to apply some of their findings to 
gender. They were also in a hurry as they were trying to help 
companies follow through on the goals set in the so-called Davies 
Report, written by Lord Davies of Abersoch. Following an inde- 
pendent review into the number of women on corporate boards 
launched in February 2011, Lord Davies recommended that UK 


listed companies in the FTSE 100 set a minimum target for 
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25 percent female board member representation by 2015. Lord Da- 
vies said: “Over the past 25 years the number of women in full- 
time employment has increased by more than a third and there 
have been many steps towards gender equality in the workplace, 
with flexible working and the Equal Pay Act, however, there is 
still a long way to go. Currently 18 FTSE 100 companies have no 
female directors at all and nearly half of all FTSE 250 companies 
do not have a woman in the boardroom. Radical change is needed 
in the mindset of the business community if we are to implement 
the scale of change that is needed.” 

Similarly, Theresa May, then home secretary and minister for 
women and equality, commented: “Women make up more than 
half of the population, but account for just 12.5 per cent of FTSE 
100 directors. Lord Davies’ report is an important step forward in 
understanding why this is and what can be done about it, and I 
shall be considering his findings very carefully.’”* 

The illustration on the cover of the report graphically made 
this point. But was this smart messaging? Based on insights about 
the relevance of norms in other domains made by the director of 
the Behavioral Insights Team, David Halpern, I was skeptical. 
Was it really smart to focus on the small minority that women 
represented? Couldn’t this data point become self-fulfilling, sug- 
gesting that this state of affairs is the norm? After all, when it 
comes to gender we are up against views that are likely more 
strongly held than those about littering, voting, or tax collection. 
In an analysis of attitudes espoused in the World Values Surveys 
of 1990, 1995, and 1999, for example, Nicole Fortin found that 
across twenty-five OECD countries, women were less likely to 
be employed outside of the home in countries where a majority 


agreed with the statement, “When jobs are scarce, men should 
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Representing the fraction of women on corporate boards 


have more right to a job than women.” Around the world, per- 
ceptions of men as breadwinners and women as homemakers still 
matter a lot.* 

I suggested the Department of Business, Innovation, and Skills 
run a field experiment to evaluate the effectiveness of these com- 


munication strategies. Maybe they worked just fine. Perhaps norms 
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did not play the same role in the gender domain as they did in 
other domains, and focusing on the fact that most companies and 
most countries had very few women leaders did not hurt the cause. 
Or, alternatively, perhaps there was something useful to be learned 
about encouraging gender diversity. What if instead of describing 
the small fraction of female corporate directors, messages focused 
on the large fraction of companies with gender-diverse boards? In 
fact, I brought an image created by Kerry Conley of the Women 
and Public Policy Program to the meeting to illustrate what this 
could look like. 

The department never ran the study. My suggestion was 
crowded out by other felt imperatives. I did note, however, that 
in 2013, Secretary Cable no longer focused on the small share of 
women on boards but instead said: “Today 94 of the FTSE 100 
companies count women on their boards as do over two thirds of 
all FTSE 350 companies.” 

By the end of 2015, the FTSE 100 companies had more than 
25 percent female directors on their boards. A success story? 
Mostly. The best answer is an enthusiastic yes with a little caveat. 
The achievement certainly is a huge success. After years of very 
little movement, the United Kingdom was able to more than 
double the fraction of women on its corporate boards—all without 
coercion. Much work went into making this happen, by the gov- 
ernment, the private sector, and by nongovernmental organiza- 
tions such as the 30% Club, devoted to increasing the fraction of 
female directors on FTSE 100 boards to 30 percent.° 

The caveat relates to what was learned in the process. Unfor- 
tunately, the department never ran a controlled trial, so we will 
never know what the impact of the reframed communication was. 


Did focusing people’s attention on the new norm of having gender 
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Representing the fraction of gender-diverse corporate boards 


diverse boards matter, or was Britain’s big step forward attribut- 
able to some of the many additional interventions used? 

While we will never quite know, my colleagues and I could not 
let the question rest without some sort of a research-based an- 


swer. Short of a field experiment, we brought the question to the 
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lab—and were shocked by what we found. We had a first set of 
student subjects play the role of “employer” and asked them to 
hire a team of five people for a math and a verbal task. We chose 
these tasks, stereotypically associated with men and women, on 
purpose as we wanted to see whether a norm nudge could correct 
stereotypical behavior. This first set of employers was the norm 
creator. We ran various sessions and then informed a second set 
of “employers” of what experimental participants in a previous 
session had done. 

It turns out that when this new set of employers had no infor- 
mation on what choices others had made, male employers tended 
to choose slightly stereotypically, with about 60 percent choosing 
majority-male teams for the math task and majority-female teams 
for the verbal task. Female employers went with a 50:50 split on 
both tasks, on average. When we informed them about a session 
where most employers had chosen majority-female teams for both 
math and verbal tasks, men reacted by choosing fewer women. 
Under that condition, about 70 percent chose majority-male teams, 
in both the math and the verbal tasks! The nudge had moved them 
in the wrong direction and cost women 10 percentage points in 
the math task and 30 percentage points in the verbal task. In con- 
trast, when we informed male employers about a session where 
most employers had chosen majority-male teams, not much 
happened. Neither did anything happen for female employers. On 
average, women pretty much chose male and female employees 
equally, independent of the information provided.’ 

Such male defense of the status quo is not unprecedented. After 
all, we are in a zero-sum world. Increasing the fraction of one 
gender in a team means decreasing the share of the other gender— 


unless one can make the team bigger. Deutsche Telekom was one 
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of the first large companies, and the first DAX-30 company, to 
introduce a gender quota for its middle and senior management. 
On March 15, 2010, it announced that by the end of 2015, it 
wanted to have 30 percent women in its middle and senior ranks. 
As has happened in other companies with gender diversity goals, 
many men were not excited about this prospect. They saw, quite 
literally, their slice of the middle and upper management pie 
shrinking in front of their eyes. German journalists wrote about 
“discrimination of men,” the “battle of the sexes,” and asked 
“where to put the men.”’ 

Thus, male resistance to interventions favoring women is real, 
whether in the laboratory, in corporate offices, and, maybe, even 
in board rooms. Aaron Dhir, in his 2015 book, Challenging Board- 
room Homogeneity, argues that gender diversity norms, while often 
espoused in theory, have not become a reality in most board rooms 
yet. He is more optimistic about the United Kingdom. The UK 
government appears to have been able to play the role of a “norm 
entrepreneur,” a term coined by Cass Sunstein in 1996. Without 
relying on legally mandated gender representation on boards, it 
has been able to move the needle. 

Norm entrepreneurs build on latently available notions of right 
and wrong, even if people treat such notions more as theoretical 
concerns rather than guidelines determining their actions. Dhir 
believes that the United States could be fertile ground for norm 
entrepreneurship around diversity. He reports on the many busi- 
nesses that supported the US Supreme Court’s 2003 decision to 
uphold the University of Michigan Law School’s affirmative ac- 
tion policy in Grutter v. Bollinger. The firms filed an amicus curiae 
brief arguing that American businesses needed to have access to a 


diverse talent pool to compete in an increasingly global world with 
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ever-growing diversity. A few years later, many of the same firms 
submitted another brief in Fisher v. University of Texas at Austin, ba- 
sically repeating their earlier arguments and stating that the case 
for diversity had become even more compelling.’ 

Norm entrepreneurship can help these organizations embrace 
diversity not just as a principle but also as a practice. Invoking what 
others do appears to be more likely to work at resetting norms the 
less people perceive the consequences as zero-sum. Organizations 
need to find ways to increase the pie, for example, by increasing the 
size of executive committees or boards, an approach that many 
Norwegian companies opted for when they had to comply with the 
quota of having 40 percent female directors. The fixed-pie mentality 
is a well-known barrier to creative problem solving. How people 
see competition matters. If people perceive every additional person 
joining the labor force as a threat, they will be less welcoming of 
new entrants, including women. In her analysis of the relationship 
between people’s attitudes and women’s workforce participation, 
Fortin finds that in countries where men had a more favorable 
view of competition, women’s employment rates were higher.'° 

Nevertheless, some constraints are real, and choices have to be 
made. Men who have been standing in line for the next top job 
will not be excited about additional competition from women. 
That this is so should strike us as neither surprising nor new. Those 
who benefit from existing practices and norms generally do not 
cheer when barriers to entry for new competitors are lowered; 
monopolies and cartels rarely go quietly. But ignoring those con- 
cerns can backfire, as the research on intergroup threat suggests. 
In the worst case, increasing women’s economic independence 
has led to a surge in domestic violence, evidence from Bangla- 


desh and India suggests. Thus, when changing norms, taking both 
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winners and losers into account is the prudent and the prag- 
matic approach." 

One such approach is reaping the benefits that can come from 
people comparing themselves with others. This means copying 
others, yes, but it also implies competing with others. Take 
Opower, a company based in the United States, as an example. I 
have immediate experience with their intervention, being the re- 
cipient and beneficiary of it. Opower has our utility company, 
National Grid, send our household a personalized Home En- 
ergy Report that compares my family’s energy consumption with 
how much my neighbors consume. 

We are currently doing quite well, outperforming our neigh- 
bors. But this was not always the case. When we got our first report, 
we were in the worst category. This, we decided, was unaccept- 
able. We had our roof insulated and solar panels installed. In 
addition, we now follow the “Energy Saving Tips” provided by 
Opower, keep our house warmer in the summer, and wear an ad- 
ditional sweater in the winter. The latter steps in particular were 
simple things, and all the changes we embraced substantially 
decreased our energy use and helped us save money. In the United 
States, seventy utility companies have implemented Opower pro- 
grams and more than 8 million households are in their experi- 
mental populations. This has made Opower one of my favorite 
examples of a successful design when giving a talk about the power 
of norms. Invariably, someone in the audience has been subject 
to this intervention. 

In fact, my family and many others have truly been “subjects.” 
Opower has conducted a number of field experiments, using both 
treatment and control groups. On average, their interventions 


have significant short-term and long-term effects, decreasing en- 
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ergy use even after households have received the mailing and 
leading to a lower level of consumption after the program has 
been discontinued.” 

But, as almost always occurs when researching behavioral in- 
sights, Opower’s interventions do not work equally well with all 
subjects or all the time. Some studies have found the Opower in- 
tervention can lead to backlash among households holding more 
conservative views, causing these homes to increase their energy 
consumption. Others report a boomerang effect, where people 
increased their usage after learning that they were consuming less 
than average. This reminds me of the moral licensing effect we dis- 
cussed in Chapter 2, where people who have (or just believe they 
have) done something good feel licensed to do something bad. I 
have watched this unfold in my own house. Since joining the ranks 
of better-than-average energy users in our neighborhood, my hus- 
band has gotten into the habit of reminding the rest of us (who 
are not so great at turning off lights) that we should not feel li- 
censed to misbehave only because we decreased our use of air 
conditioning.’ 

Accepting the caveats of backlash and moral licensing, Opow- 
er’s intervention is cheap and effective. It stands atop a wealth of 
research. By making the appropriate comparisons, we can focus 
people’s attention on specific aspects of a problem. Imagine you are 
a development officer wishing to increase donations to your charity. 
You must design a flyer (or email blast) to past contributors. What 
do you draw the recipients’ attention to? You would likely focus on 
what people have contributed in the past, maybe creating tiers (as 
many organizations do) that indicate whether people have given at 
the gold, silver, or bronze level. Certainly, you would not rank 


people in terms of how much they kept for themselves or how 
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much they still have left over after having donated. A simple labo- 
ratory experiment confirmed this. Participants were placed in a 
dictator game, where individuals were given a certain amount of 
money that they were to divide as they wished between themselves 
and an anonymous person. Researchers found that people gave 
substantially more in a “generosity tournament,” where they were 
ranked publicly from most to least generous, as compared to an 
“earnings tournament,” where they were ranked based on how 
much they had kept for themselves. In brief, social comparisons 
matter—on almost anything that we can measure—but we have to 
focus people’s attention on the outcome that we care about. 

Exploring this further, Richard Zeckhauser and I became in- 
terested in whether people also take fairness clues from others. We 
ran a modified version of the ultimatum game introduced in 
Chapter 11. It turns the dictator game described above into a bar- 
gaining game where the proposer can offer the receiver a sum of 
money which she can accept or reject. If she rejects it, however, 
all the money is lost, including the amount that the proposer was 
hoping to keep for himself. If the receiver accepts, the deal stays 
as proposed. When Richard and I told all participants that we 
would inform the receivers of the average amount given by the 
proposers before they had to make their accept/reject decision, 
proposers converged on a norm. In our case, this was an equal 
split. Proposers feared that receivers would punish deviations from 
what others did, and they were right." 

Making public and visible how well a company or country does 
in terms of gender equality compared to others might also pro- 
mote convergence on a new norm. Indeed, a number of organi- 
zations now provide social comparisons or explicit rankings based 


on gender equality. In 2006, the World Economic Forum (WEF) 
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launched its annual Global Gender Gap Report measuring the 
existing gender gaps in four categories: economic participation 
and opportunity (pay, participation, and leadership), political em- 
powerment (representation), education (access), and health and 
survival (life expectancy and sex ratio at birth). Since then, the 
WEF has annually published a report measuring how the gaps are 
changing over time. It ranks countries on their overall perfor- 
mance, as well as on how well they do in all four categories. Over 
nine years, the Nordic countries have been leading the pack with 
Iceland having closed the overall gap by 87.3 percent (with 
100 percent indicating gender equality) as of 2013. Generally, 
Middle Eastern and North African countries have fared worst, 
with Yemen having closed only 51.2 percent of the overall gap. 

Saadia Zahidi of the WEF and the lead author of the report 
explains: “The notion that gender equality is not only the right 
thing to do, but the smart thing is a fairly new mindset that did 
not exist in the public consciousness even five years ago.” Klaus 
Schwab, the founder and chairman of the WEF, adds: “Achieving 
gender equality is obviously necessary for economic reasons. Only 
those economies who have full access to all their talent will re- 
main competitive and will prosper. But even more important, 
gender equality is a matter of justice. As a humanity, we also have 
the obligation to ensure a balanced set of values.” t3 

Related country reports and rankings, each with a slightly 
different focus and methodology, have since been created by 
the World Bank, the United Nations Development Program 
(UNDP), the OECD, and the European Institute for Gender 
Equality, among others. While what facets of gender equality 
they measure and track differ—some focus on gaps between men 


and women, for example, in economic opportunity, and others 
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look at absolute rates, for example, in terms of workforce partici- 
pation; some cover outcomes, such as compensation, and others 
measure input variables, such as laws; some include measures 
of violence and others of access to finance—they all generally 
come to similar conclusions: gender equality is quite advanced 
in the Scandinavian countries while the nations of the Middle 
East, North Africa, and Sub-Saharan Africa have the longest way 
to go. 


e In 2009, the World Bank started to collect data on Women, 
Business, and the Law, which analyzes the legal differentia- 
tions on the basis of gender in 143 economies around the 
world. The report covers six areas in relation to gender: 
accessing institutions, using property, getting a job, 
providing incentives to work, building credit, and going 
to court. Based on the 2014 report, a seventh area, pro- 
tecting women from violence, has recently been added for 
100 economies. 

e In 2009, the OECD’s Development Center published its 
first Social Institutions and Gender Index (SIGI), which 
provides a composite index of gender inequality. The 
measure uses five subindices to calculate this score: 
discriminatory family code, restricted physical integrity, 
son bias, restricted resources and assets, and restricted 
civil liberties. 

e In 2010, the UNDP launched its Gender Inequality Index 
(GII), which is a composite measure that calculates differ- 
ences in the distribution of achievements between women 
and men across countries. The measure uses three dimen- 


sions to calculate this score: (1) reproductive health measured 
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by maternal mortality ratio and adolescent birth rates; (2) 
empowerment measured by proportion of parliamentary 
seats occupied by females and proportion of adult females 
and males aged 25 years and older with at least some 
secondary education; and (3) labor market participation rates. 
e In 2013, the European Institute for Gender Equality 
launched its Gender Equality Index report, where it 
documents the position of the European Union in terms 
of gender equality. It ranks EU countries based on their 
gender gaps in several sub-categories including work, 
money, knowledge, time, health, power, violence, and 


other intersecting inequalities. 


Judith Kelley of Duke and Beth Simmons of Harvard argue 
that rankings have become an effective instrument of “soft power” 
in international governance, including on some of the biggest 
challenges of our time such as human trafficking. Countries which 
have been included in the annual US Trafficking in Persons Report or 
have been placed on the “watch list” are more likely to crimi- 
nalize trafficking. Corporations have also been found to respond 
to ratings, for example, with respect to their impact on the envi- 
ronment. Researchers evaluated how several hundred firms re- 
acted to the ratings issued by a well-known social rating agency. 
The poorly performing companies improved their performance 
after the ratings became public while companies that were never 
rated or did better in the initial evaluation did not change."® 

Mike Luca of Harvard Business School and his collaborators 
demonstrated the impact of rankings on students’ application 
decisions. Improvements in a college’s rank on US News and World 


Report Best College Rankings immediately translate into a 
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larger number of applications. Interestingly enough, however, the 
authors only find this effect when the magazine presents the col- 
leges ordered by rank. When the colleges are listed alphabetically, 
with their rank included in the body copy describing the institu- 
tion, no effect on applications could be detected. Only easily 
understandable and highly visible comparisons mattered, some- 
thing gender equality designers hoping to influence behavior 
need to keep in mind. A final but important consideration is this: 
precisely because people and organizations care about their relative 
standing, rankings may motivate attempts at manipulation. Ben 
Edelman and Ian Larkin, for example, show that when people fall 
behind, they may engage in deception to boost their rank.” 

Laws and regulations can work in a similar fashion. Just as 
learning about what others do can make us want to follow the herd 
or even outperform it, learning what is approved or sanctioned, 
either formally by law or informally by social norms, also provides 
information on what acceptable behavior is. The law, thus, often 
goes beyond deterrence; it also has an “expressive function.” Even 
if the expected cost of violating the law is too small to actually 
deter, people may take a law as a signal for what the social norm 
regarding a certain behavior is. 

Take jaywalking. Growing up in Switzerland, where people 
only cross the street when the light tells them to, I used not to 
jaywalk. Then I moved to Cambridge, Massachusetts, and watched 
in disbelief how most everyone (but the tourists) crossed the streets 
in Harvard Square with complete disregard for the colors on the 
traffic lights. Eventually, I started doing the same—although to 
this day, I always check whether there are any children about and, 
if I am walking with someone, wait to see what he or she is doing. 


Clearly, jaywalking norms in Switzerland and in Cambridge are 
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very different. I doubt that the likelihood of being fined or the 
size of the fine dramatically differ, so the difference in behavior 
lies elsewhere. Economists would say the two places have settled 
on two different equilibria, a jaywalking and a nonjaywalking one. 
Without an intervention, it is unlikely that the norms in either 
place would change. A habitual jaywalker might be incredulous: 
no intervention is going to disrupt so ingrained a habit, one that 
verges, for some, on a perceived right. Odds are, however, that 
jaywalker isn’t also a smoker. 

In the United States, smoking in public was quite common 
until relatively recently. But then, legislation translated an emerging 
shift in the public’s views into something concrete. Nonsmokers 
who wanted others to refrain from smoking, particularly anywhere 
near them, now had the moral authority to do so. Social norms 
were changing from a smoking to a nonsmoking equilibrium, 
constantly being reinforced by people’s increasing willingness to 
sanction deviations. Similar arguments may apply to large differ- 
ences across locations in compliance with laws and regulations 
governing speeding, shoplifting, unethical behavior in organiza- 
tions, or riding public transportation without paying. Laws and 
regulations activate informal systems of control. They change how 
we interpret actions. For example, the National Hockey League’s 
requirement that players wear helmets completely reframed the 
discussion as to what was proper, acceptable, and to be encour- 
aged. Wearing a helmet, which had been considered a sign of 
weakness or even unmanliness, now became the norm.!* 

Title IX, one of the landmark laws in the United States, offers 
another example of the power of regulation to set new norms with 
far-reaching consequences. Signed into law on June 23, 1972, by 


President Richard Nixon, Title IX prohibits gender discrimina- 
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tion in any federally funded education program or activity. It is 
best known for opening doors for girls and women who wished 
to compete in sports, but it has nine other provisions that influ- 
enced norms well outside of sports, including protecting students 
from sexual harassment, providing equal access to higher educa- 
tion, math, and the sciences, and fair treatment for pregnant and 
parenting students. Specifically, the law mandates: “No person in 
the United States shall, on the basis of sex, be excluded from par- 
ticipation in, be denied the benefits of, or be subjected to discrimi- 
nation under any education program or activity receiving Federal 
financial assistance.” 

Before it passed, fewer than 300,000 girls had access to high 
school sports in the United States. Today, it is estimated that more 
than 3 million girls participate. Given the central role athletics 
plays in American educational culture, this is a massive shift. In- 
terestingly, the law’s advocates had an inkling of what was to come 
and deliberately kept it quiet. Celebrating the fortieth anniversary 
of the law in 2012, a documentary, Sporting Chance, tells how Title 
IX came to be. Bernice Sandler, an American women’s rights 
activist who worked with Edith Green, congresswoman from 
Oregon, and Birch Bayh, senator from Indiana—two of its key 
proponents—recalls in the film how Green wanted to avoid 
drawing too much attention to the law as she feared that if people 
understood its potential to promote social change they might op- 
pose it: “I don’t want you to lobby. Because if you lobby, people 
will ask questions about this bill, and they will find out what it 
would really do.” Sandler continues: “And she was absolutely right. 
It was quite a big break that no one was watching.” 

The impact of Title IX only became apparent a few years after 


it passed when it started to trigger proposed amendments, including 
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an attempt to exempt “money-making” sports from Title IX, 
various Supreme Court cases, and a host of political actions. Con- 
doleezza Rice, former US secretary of state, recalled how Title IX 
overturned social norms in her former hometown of Birmingham, 
Alabama, then the most segregated big city in the United States. 
She spoke about “the tremendous explosion of opportunity for 
young women on the playing field.” And off it. “So, I very often 
think of Title IX as trying to do away with discrimination but 
really, being about giving opportunity.” 

Many rules serve expressive functions that often go unexam- 
ined. They can informally sanction and reward behavior, which 
in turn can prove a liability. Take the example of Lee Iacocca, 
president of the Ford Motor Company from 1970 to 1978, the year 
he was fired. Iacocca has been described as the driving force behind 
the Ford Pinto. In 1977, allegations were raised that the Pinto’s 
structural design was compromised, creating a fire hazard. In 1978, 
1.5 million of the subcompacts were recalled to install fuel-tank 
protection. In The Ford Pinto Case, their case study on applied ethics 
taught in many business schools, Douglas Birsch and John Fielder 
report that safety concerns were not the norm at Ford: “Iacocca 
was fond of saying, ‘Safety doesn’t sell? ” Coming from the top, 
that sent a strong message about company values and what is 
rewarded and what sanctioned, formally and informally.”° 

What companies live and breathe matters, likely more than any 
written corporate codes of conduct. Norm entrepreneurs in orga- 
nizations and in public policy promote behaviors by harnessing 
people’s desire to imitate, compete, and gain social approval. Per- 
haps the most encouraging testimonial to the powers of norm 
entrepreneurship is the UK government. Its success at more 
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than doubling the share of female board members without re- 
lying on quotas should be a clarion call to others. 


Designing Gender Equality—Become a Norm Entrepreneur 


Make others’ successes increasing gender diversity salient. 
Use rankings to motivate people to compete on gender 
equality. 


Use rules, laws, and codes of conduct to express norms. 


13 


Increasing Transparency 


Hungry tourists in Los Angeles, or perhaps longtime residents vis- 
iting a new neighborhood, face three restaurants they have never 
eaten at before. In the door of one, easily visible, is a white piece 
of paper with a large letter C on it. A quick closer glance and black 
letters, smaller, above the C read, “Sanitary Inspection Grade.” 
Its close neighbor has a letter B posted on its door. And beside it, 
the third restaurant has a letter A in its window. Assuming our 
hungry visitors are broadly agnostic about whether they eat Chi- 
nese or Japanese or Italian, and pricing is roughly equivalent 
across all three choices, which restaurant do you think they enter? 

That is the power of transparency. Consider a different ex- 
ample. When you last booked an airplane ticket online, did you 
read the disclosure statement? If you are like most everyone else, 
you did not. Instead, you just checked the appropriate box 
acknowledging that you were okay with the company’s policies. 
If I asked you what fraction of consumers you imagine read the 
privacy disclosures on websites, you would probably guess that it 


is small. But would you have thought that it is only 3 percent, as a 
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research study suggests? We do not do much better reading warning 
labels on products themselves, according to a review of about 
400 studies. Most consumers ignore them. So, why do we— 
governments, companies, consumers—bother?! 

For starters, some companies overestimate the influence dis- 
closure requirements have on consumers. Known as the spotlight 
effect, we tend to have an exaggerated expectation of others’ aware- 
ness of our actions. In addition, companies know that it some- 
times takes only a few active consumers or consumer groups to 
raise concerns, making a wider public pay attention. Consequently, 
not only do companies often respond quickly to initial concerns, 
they take disclosure requirements seriously, even when they know 
only a small percentage of consumers give them much time and 
attention. When the European Union and the United States 
mandated energy efficiency labels on appliances, for example, 
manufacturers started immediately to innovate, offering more 
energy-efficient products even before consumers began to demand 
different kinds of appliances.” 

At the same time, disclosure requirements too often do not 
reach their potential. Poorly designed or implemented, their in- 
fluence is blunted. A few behavioral design principles can help, and 
sometimes spectacularly so. The more salient and visible informa- 
tion is, the more likely people are to notice. The easier it is for 
consumers to read disclosures, from the size of the typeface to how 
simple and direct the language, the more likely consumers will 
read it. And the more information is put into a comparative 
context—‘“with this car you save $1,850 in fuel costs over five years 
compared to the average new vehicle’—the more people are able 
to understand it. The same behavioral rules discussed in Chapter 6 


for people evaluations apply whenever we need to process infor- 
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mation, particularly when the information is complex or unfa- 
miliar. To increase the chances that we process it accurately, infor- 
mation needs to be salient, simple, and in comparative context. 

This is why, at least to a behavioral economist, the energy 
labels on appliances and cars sold in the United States qualify as 
beautiful. The information is provided comparatively so that cus- 
tomers can calibrate, for example, the operating cost of a specific 
car relative to others. In addition, complicated metrics such as 
MPG—miles per gallon—have been replaced by better ones. 
Why? Because linear metrics are simpler than nonlinear ones. 
Miles per gallon is nonlinear: one gallon is saved per 100 miles if 
the MPG changes from ten to eleven or from thirty-three to fifty. 
Gallons per hundred miles, or GPhM, gives customers a better 
sense of what is going on. In this car, you go one hundred miles 
and you consume this many gallons; in another car, it’s more, or 
less. GPhM is linearly related to consumption and cost. 

For example, the 2015 Toyota Prius is a hybrid vehicle that runs 
on gasoline with a suggested retail price of about $24,000 to 
$30,000. The average gallons per 100 miles are 2.1, an average es- 
timate based on six vehicles. If you have a Toyota Prius, the EPA 
reports that you save $4,500 in fuel costs over five years compared 
to the average new vehicle.’ Salient, simple, and comparative. 

One of the most convincing advocates for simplification is Cass 
Sunstein, formerly head of the Office of Information and Regula- 
tory Affairs, also known as America’s “Regulatory Czar” under 
President Barack Obama. His wonderful book Simpler, published 
shortly after he left office, uses examples of many of the simplifi- 
cations he helped oversee and provides helpful guidelines on how 
we can simplify our communications and make them more effec- 


tive. These include a number of disclosure requirements “designed 
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to protect students, consumers, and investors by ensuring that they 
“know before they owe.” The Credit Card Accountability, Re- 
sponsibility and Disclosure Act of 2009, for example, requires clear 
disclosure of annual percentage rates and finance charges, and ad- 
vance notice of changes. Your monthly credit card statements now 
include a “minimum payment warning” explaining how long it 
would take you to pay off the card by submitting just the minimum 
and how much this would cost. To make customers even smarter 
about debt, Mike Luca argued that this transparency should be 
extended from paper copies to online statements. He shows 
how easily credit card companies could create an online tool that 
would help customers understand the long-term implications of 
different payment amounts.* 

Sunstein was also involved in helping Americans rethink what 
they eat. For a great many reasons, the government wished Amer- 
icans to be healthier than not, and encouraging more Americans 
to eat healthier food seemed, well, low-hanging fruit. Except for 
many years it wasn’t. Since 1992 the US Department of Agri- 
culture had used a food pyramid to help Americans understand 
how many fruits and vegetables to eat as compared to, say, meats. 
It turns out the pyramid did not work well for most people. In 
their book Switch, Chip and Dan Heath didn’t mince words: 
the pyramid’s message is “opaque... , confuses and demoral- 
izes.” Sunstein was in a position to do something about it. Ac- 
cordingly, the Obama administration searched for a better image. 
And sometimes, simple really means simple: the pyramid was out 
and in its place was a dinner plate. Unevenly quartered to rec- 
ommend a healthy balance of vegetables, fruit, protein, and grains, 
it also showed a nearby glass representing dairy. The govern- 


ment’s MyPlate image is a great example of effectively transparent 
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information. Unlike the pyramid, MyPlate is a mirror to your 
own plate, and the ease of comparison—perhaps you are used to 
seeing half of your plate taken up with steak rather than the sug- 
gested slightly less than a quarter—is immediate. And revela- 
tory. As a yogurt lover, I confess to being a bit shocked by how 
small the dairy cup is.° 

MyPlate is not the only much-celebrated example of successful 
disclosure to encourage healthier eating. Recall our visitors 
weighing dining choices who opened this chapter. Public displays 
of hygiene ratings in restaurants have proven a highly effective 
intervention in transparency. In January 1998, Los Angeles’s De- 
partment of Health Services decided that going forward restau- 
rants would be required to post a grade card (with letter grades 
A, B, or C) in their windows indicating how the department 
had rated them during their most recent hygiene inspection. The 
result was dramatically improved hygiene in all city restaurants. 
Due to the salience, simplicity, and easy comparability of the infor- 
mation displayed, patrons started to pay attention to restaurants’ 
hygiene. The number of foodborne illnesses decreased, both 
because people switched from lower-quality to higher-quality 
restaurants and because restaurants improved their sanitation. 
Over time, health inspection ratings increased citywide. 

The next leap forward in eating confidently at a new restau- 
rant might come with data analytics. Today, many cities have web- 
sites or apps where consumers can learn about how well their 
favorite restaurants did on their hygiene inspections. New York 
has an app called ABCEats and San Francisco collaborates with 
the review website Yelp to make restaurant health information 
accessible to the public. Indeed, making use of the large amounts 


of data created through customer reviews, researchers are now 
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able to build algorithms that predict restaurant quality. According 
to Mike Luca and Yejin Choi, their statistical model based solely 
on customer reviews is able to discriminate with more than 
80 percent accuracy between restaurants you likely want to avoid 
and restaurants with no history of unsanitary conditions.° 
Information disclosure can make us healthier, safer, wiser, and 
more responsible. It has the potential of reaping large benefits at 
relatively low cost, it remains an attractive tool for policy makers 
and regulators, and it is now increasingly being used to make or- 
ganizations more diverse. A tool is only as useful as it is well de- 
signed, of course. An unimpressive effort was initiated by the 
Securities and Exchange Commission in 2010. The SEC ruled 
that companies had to disclose how they considered diversity 
when they selected their boards. Specifically, the rule requires 


‘ 


public companies to disclose “whether diversity is a factor in 
considering candidates for nomination to the board of directors, 
how diversity is considered in that process, and how the company 
assesses the effectiveness of its policy for considering diversity.” 
By now, you likely can guess why this approach isn’t effective. 
It fails on salience, its message is opaque, and there is zero effort 
at comparative insight. Here’s a simple suggestion: if you are going 
to advocate for diversity, be prepared to define it, simply. The SEC 
did not. Indeed, the only saving grace of their effort was the trans- 
parency of its ambition: the new rules, the SEC said, were “not 
intended to steer behavior.” Unsurprisingly, they did not. Aaron 
Dhir finds in his study that companies frequently did not include 
demographic factors when considering diversity. Instead, they de- 
fined diversity in terms of directors’ prior experiences. While the 
exclusion of identity-based factors has raised concerns, including 


from a number of large institutional investors, Dhir remains 
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skeptical that the SEC rule will have any noticeable impact on 
gender or other demographic diversity on US corporate boards. 

Most other countries that have adopted disclosure requirements 
have offered specific definitions of diversity. For example, the 
Code of Corporate Governance in Singapore specifies that “the 
board committees should comprise directors who as a group pro- 
vide an appropriate balance and diversity of skills, experience, 
gender and knowledge of the company.” In 2014, the Singaporean 
Diversity Task Force regarding Women on Boards issued a report 
detailing specific recommendations on how to increase diversity 
on boards, including the publication of rankings. 

Some countries have gone further, adopting a comply-or-explain 
approach. Regulators provide guidance on what they consider 
good policy, organizational practices, and even outcomes, and ask 
companies to either comply or publicly disclose why they did not. 
For example, corporate governance codes in Germany, the Neth- 
erlands, and the United Kingdom apply this approach, setting 
standards for boards’ audit and compensation committees. The 
disclosed information helps investors, proxy advisors, and share- 
holders evaluate a board’s decisions, actions, and outcomes and take 
appropriate action. 

With the comply-or-explain approach, the government in ef- 
fect sets a soft default for companies. It defines what it considers 
to be the desired course of affairs and asks companies opting out 
of them to explain why. While not restricting choices, such soft 
defaults create a reference point that people dislike leaving. And 
they leverage our inertia, with people and organizations avoiding 
or delaying change that might be costly and painful. Default set- 
ting is a powerful instrument in a behavioral designer’s toolbox. 


When opting out is required to change the status quo, the enroll- 
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ment rate in retirement savings plans has been shown to be up to 
almost 40 percentage points higher in opt-out than in opt-in plans.’ 

An increasing number of countries are now using the comply- 
or-explain approach to promote gender diversity. The Australian 
Securities Exchange, for example, asks companies to report an- 
nually their diversity policies and degree of diversity. They must 
not only report the fraction of female employees overall, but 
also the number of women in senior positions and on a com- 
pany’s board of directors. Finally, they must spell out their 
overall goals regarding gender diversity and to what degree they 
have met them. Similarly, in Canada, the Ontario Securities 
Commission introduced comply-or-explain rules in 2014 that 
required companies listed on the Toronto Stock Exchange to 


annually disclose: 


e “whether the issuer has a written policy regarding the represen- 
tation of women on the board and if not, why not; 

e whether its board or nominating committee considered the level 
of representation of women in the director identification and 
selection process and if not, why not; 

e whether the issuer considers the representation of women in 
executive positions when making executive appointments and if 
not, why not; 

e whether the issuer has targets for the representation of women 
on its board and in executive positions, and the annual and 
cumulative progress in achieving such targets, and where there 
are no targets, why not; and 

e the number and proportion of women on the board and in 
executive positions of the company and each of its major 


subsidiaries.” 
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Spurred by the Davies Report of 2011, which I introduced in 
the previous chapter, the United Kingdom has likely gone the far- 
thest, using disclosure requirements toward successfully reaching 
its target of 25 percent female directors by the end of 2015. In 2015, 
there were no all-male boards left among the FTSE 100 compa- 
nies, which is a first in the history of the London Stock Exchange. 
In the same year, more than 90 percent of the FTSE 250 compa- 
nies were gender diverse, whereas in 2011, more than half of these 
boards were male only. In 2013, the Netherlands followed suit and 
set a 30 percent goal for both corporate supervisory and executive 
boards, specifying that at least 30 percent of the seats should be 
held by women. If a company fails to meet this goal, it must ex- 
plain why and what actions it plans to take to meet the goal in the 
future." 

Disclosure requirements are popular in part because they 
help people make more informed decisions without limiting 
their autonomy. It is left up to shareholders, investors, analysts, 
and the public to decide how to use the disclosed information. 
Those unconcerned about gender may disregard it while those 
who care about the inclusion of both sexes in decision making 
and leadership may act on it. However, we know little about 
whether, how, or to what degree gender diversity disclosure is 
working to promote diversity. These provisions were recently 
introduced and were often accompanied by additional interven- 
tions, making it impossible to tease apart the particular relevance 
of disclosures. Also, specific disclosure rules vary greatly across 
the countries that have adopted them. One comparative fact, 
however, stands out. In the United States, where the disclosure 
requirements were unspecific and “not intended to steer be- 
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havior,” the share of female directors has moved little in recent 
years, with 16.1 percent female directors in 2011 and 19.2 percent 
in 2014. It is hard to imagine that the United Kingdom was able 
to surpass the United States in that same time frame without the 
Davies Report and the evaluative transparency that comes with 
disclosures.’ 

Still, the question remains: does disclosure work? Disclosure 
in other domains, sometimes based on randomized controlled 
trials, offers insights and caveats. One facet of the US Patient Pro- 
tection and Affordable Care Act of 2010 sheds useful light. Most 
of us declare ourselves for gender equality, and nearly all employers 
would assure us that their desire and intent is to hire the most tal- 
ented employees. It is not unlike the majority of us who agree that 
eating healthily is a good thing. But when offered a choice be- 
tween French fries and salad as a side, many of us select the fries— 
much like the employer who, confronted with a choice between a 
more qualified candidate and one who shares his love of baseball, 
goes with “fit” instead of “ability.” 

The Affordable Care Act tries to address this disconnect be- 
tween good intentions and bad actions by mandating that calorie 
information be disclosed on the menu boards of chain restaurants 
with twenty or more locations. The evidence of the impact of cal- 
orie disclosure, however, is mixed. Even for the well-intentioned, 
it is hard to get disclosure right, and even when information meets 
the behavioral requirements of simplicity, salience, and compara- 
bility, it does not always change everyone’s behavior. Christina 
Roberto of Yale University and colleagues summarize the evi- 
dence on calorie disclosure as follows: “The documented effects 


of menu labeling on consumer and restaurant industry behavior 
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suggest that menu labeling will likely encourage some consumers 
to eat more healthfully some of the time, and the policy is likely 
an important first step toward improving the public’s eating 
habits.” 

As the wording suggests, the effects proved modest overall and 
seemed to vary depending on the customer. Indeed, research found 
that women, normal-weight individuals, socio-demographically 
advantaged groups, and customers in some but not in other chain 
restaurants were more responsive to calorie information. In addi- 
tion, similar to the hygiene findings, the channels of greatest 
influence might work through the restaurants rather than the cus- 
tomers. One study examining menu offerings in fast food restau- 
rants between 2005 and 2011, during which calorie disclosure was 
introduced in a number of municipalities, found that menu offer- 
ings changed. Chain restaurants in communities where calorie 
information was posted included healthier choices than the 
same chain restaurants residing in areas where labeling was not 
required. 

Also, not all labels are created equally. Behavioral design fea- 
tures matter, especially when it comes to comparisons. A system- 
atic review of experimental and quasi-experimental studies testing 
the impact of disclosing calories to consumers found no impact 
when only calorie information was provided. In contrast, when 
that information was put into a context people could understand, 
they started to consume fewer calories. Knowing that a burrito 
contains 1,000 calories is one thing; knowing that the recom- 
mended daily caloric intake for an average adult is 2,000 calories 
puts that first number in context. Similar positive effects were re- 
ported when the information was made more salient and simpler 


to interpret, for example, by using the colors made familiar by 
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traffic lights—green, yellow, and red—to indicate whether one 
should go ahead and consume, pause and consider, or stop and 
make different choices. 

Overall, simply disclosing nutritional information has little to 
no effect, and when rendered salient, clear, and comparative, that 
information has modest positive effects for some people. Even so, 
where is the harm? Disclosures seem to influence corporate be- 
havior. It is a relatively cheap intervention. And giving customers 
at least the option of a more informed choice is good, right? Right, 
except for the potential of it backfiring. One could well imagine 
that the moral licensing effect discussed earlier could also apply 
here. People might well compensate for picking a healthier main 
course by also picking the chocolate cake instead of the apple for 
dessert. 

Similarly, corporate boards might feel that by having disclosed 
how effective they consider their diversity efforts to be, as the SEC 
rule requires in the United States, they have done their job and 
can thereafter worry about other things. Indeed, there is some 
evidence that suggests in some circumstances disclosures benefit 
the person asked to disclose. Researchers have found that people 
judge others who have just lied as behaving more ethically if they 
had disclosed beforehand that they had an incentive to lie. Simi- 
larly, people were more likely to trust others who disclosed a po- 
tential conflict of interest.'° 

These findings should make us vigilant, but they should not 
hold us back from advocating for more transparency. They sug- 
gest that we must be aware of potential backlash effects and work 
to mitigate them, starting with holding people accountable for 
their actions. On this front, the United States has taken an aggres- 
sive stance. On April 8, 2014, President Obama not only prohibited 
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federal contractors from punishing employees who discuss their 
salaries, as mentioned in Chapter 3, but went a step further. 
Though federal law and the Equal Pay Act of 1963 state that em- 
ployers cannot compensate men and women differently for the 
same work, enforcement of this mandate has proven difficult. The 
National Equal Pay Task Force, created by the president in 2010, 
found that the absence of wage data broken down by sex and race 
was one of the main culprits. To address this, a 2014 Presidential 
Memorandum required federal contractors to submit summary 
data to the Department of Labor on employee compensation strat- 
ified by sex and race. Several other countries, for example Aus- 
tralia, Austria, Belgium, and the United Kingdom, have likewise 
passed laws that emphasize the importance of transparency in de- 
creasing gender gaps in pay, promotion, and workforce composi- 
tion, going so far as to ask employers to produce action plans if 
they are found to have fallen short." 

We need to help companies overcome the intention-action gap. 
Information disclosure, smartly designed, can help organizations act 
on their virtuous intentions to treat men and women equally and 
provide both equal opportunities. Providing simple, salient, and 
comparative information helps. Timing also matters. For example, 
people tend to respond more strongly to positive information that 
comes as an unexpected surprise. To illustrate the point, consider 
a simple online experiment examining how to increase produc- 
tivity. Three groups of data entry workers were offered a low 
wage rate, a high wage rate, or a low wage rate that was followed 
by the high wage rate after the workers had accepted the low offer. 
This pleasant surprise led to an increase in productivity of 20 percent 
compared to the other two groups, including the high wage rate 


group where incentives were identical. 
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The introduction of a specific target might have helped the 
United Kingdom meet its goal of increasing the share of women 
on corporate boards to 25 percent. Other organizations, such as 
the 30% Club, suggest that high but achievable aspirations can be 
powerful nudges. Much research in the realm of negotiations, 
for example, suggests that aspiration setting matters. Similar to 
setting personal goals, corporate targets may mobilize resources 
and focus attention. Because of this, they are not uncontroversial. 
As we have seen, setting targets in zero-sum contexts—making 
30 percent of the executive suite women when the overall number 
of executive-suite positions isn’t increased—can invite backlash 
from men. Others argue that women will be hurt by any real or 
perceived protected status. Yet others raise more general con- 
cerns: performance targets can increase stress, lead to fudged 
data, and undermine trust. While more research is needed to 
tease these various concerns apart and evaluate the impact of di- 
versity targets, many organizations continue to announce them. 
Examples span all sectors and include the Bank of England, 
Bayer, BMW, Daimler Chrysler, Deloitte, Deutsche Telekom, 
KPMG, Lloyds Bank, Louis Vuitton, Merck, and Qantas, among 
others, with targets typically focusing on adding women to se- 
nior management or corporate boards and numbers as ambitious 
as 45 percent.” 

Recent insights on goal setting suggest that organizations cur- 
rently introducing long-term targets might well be advised to set 
smaller, interim goals. Setting sub-goals has been found to have 
positive effects by increasing a sense of accomplishment, interest 
in a task, and persistence in achieving it. Related work has shown 
that setting smaller, achievable goals has helped people save more 


effectively and pay off their debts. They worked particularly well 
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when people had a single savings goal instead of pursuing mul- 
tiple goals simultaneously. One simple illustration of the effective- 
ness of sub-goals was provided by Dan Ariely, author of the illu- 
minating book Predictably Irrational, and Klaus Wertenbroch. They 
asked one set of students to proofread three essays in three weeks, 
another group to proofread one essay each week over the course 
of three weeks, and a final group to set their own schedule. The 
group with sub-goals—one essay each week over the three week 
period—outperformed the groups with one final goal only, both 
in terms of timeliness and in terms of accuracy. Not only did they 
get more of the work done, but the quality of their work was better. 
Furthermore, when given the option, students generally imposed 
sub-goals on themselves." 

Public accountability matters. Holding organizations account- 
able by asking them to “explain” when they have not “complied” 
is another lever that helps people follow through on their good 
intentions. In fact, the literature on accountability suggests that 
people charged with evaluating others are less likely to rely on ste- 
reotypes if they have to explain their choices. 

In one experiment conducted in Israel, female students at a 
teachers’ college were asked to evaluate an essay written by an 
eighth grader on “an interesting event that happened to me.” 
Everyone received the same essay, and everyone was informed that 
such evaluations were an essential part of a teacher’s job. Each was 
told to evaluate the essay’s literary merits on a scale from 1 to 100. 
However, one group of evaluators was informed that the student’s 
ethnic origin was Ashkenazi (originally from Europe or America) 
and another group was informed that the author was Sephardi 
(originally from Asia or Africa); a final group was told nothing 


about the author’s ethnicity. Finally, to examine the impact of 
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accountability on ratings, the researchers told half of the student 
teachers rating the essay that the study they were participating in 
was intended to assess their evaluative ability and that they would 
have to publicly explain their assessment. In addition, they would 
eventually be able to compare their evaluation with that of an 
experienced teacher. In contrast, the other half of the student 
teachers were told that the purpose of the study was to better 
understand differences in evaluation styles. Like in other research, 
the ethnic origin of the teachers rating the essay did not matter. 
However, the writer’s identity proved highly significant. Gener- 
ally, essays that had an Ashkenazi name were evaluated more 
favorably than those with a Sephardi name—with one exception: 
when raters knew they were to be held publicly accountable for 
their evaluations, the pro-Ashkenazi bias disappeared. 

Generally, accountability is more likely to attenuate bias when 
people confront an audience that is well informed, interested in 
accuracy, and has a legitimate reason to probe. Put simply, it works 
because people care about what others think. There are two impor- 
tant caveats. First, accountability works better when people know 
beforehand that they will be held accountable. Otherwise, they 
might become defensive and try to rationalize their behavior rather 
than improve their procedures. Second, and this is a tall order, in 
an ideal world people would not know their evaluators’ views be- 
forehand. The more they know, the more they are tempted to 
superficially conform to what they believe the audience wants to 
hear, without actually scrutinizing their arguments or changing 
underlying processes to make them better. However, while a the- 
oretical ideal, I doubt that this is practicable in most organizational 
settings. Instead, organizations have found other ways to make sure 
that talk is not cheap." 
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In their otherwise depressing review of the efficacy of diver- 
sity training programs, Frank Dobbin and colleagues found ac- 
countability to be one of the most important mechanisms related 
to the diversity of the labor force. Assigning responsibility for man- 
aging diversity to taskforces, diversity officers, or some similar 
committee was strongly associated with an increase in workforce 
diversity, including in the fraction of women. To better under- 
stand why accountability was related to an increase in diversity but 
diversity training was not, the authors conducted a number of 
interviews in a subset of the firms they originally studied in At- 
lanta, Boston, San Francisco, and Chicago. Human resource and 
line managers reported that taskforces and diversity officers were 
good at identifying problems as well as at suggesting remedies. And 
they held them accountable, acting as their conscience, monitoring 
whether managers followed through with agreed-upon initiatives. 
But these were not always formal reviews. Rather, one study sug- 
gests that accountability can work even when no one is formally 
assigned to review behavior. One reason talk is not cheap is because 
most of the people most of the time do not like to lie.'® 

An ingenious study examining voting behavior suggests how 
and why. During the 2010 midterm congressional elections, which 
historically see abysmal turnouts, Stefano DellaVigna, John List, 
Ulrike Malmendier, and Gautam Rao set out to find what the 
value of voting was to someone who thought others would ask if 
they'd bothered to vote. It turns out the answer is about $10 to 
$15. That is the amount of money people were willing to spend 
to be able to honestly tell others that they had voted. This insight 
suggests that managers do not need to answer to a formal task- 
force to be nudged by accountability; just a belief that they might 


have to answer to someone can do the trick.” 
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People care about what others think, and so do companies. 
This is the power of transparency. It enables everyone to make 
more informed choices, whether by buying a particular car because 
ofits energy efficiency, by paying down debt more quickly because 
of the long-term consequences, or by working for or investing 
in a company that is more diverse and pays more equitably. 
Holding organizations accountable through disclosure and 
comply-or-explain approaches can make compliance the soft de- 
fault. And the more behaviorally smart the shared information, 
the more likely it is going to move the needle toward increased 


transparency and gender equality. 


Designing Gender Equality—Behaviorally Informed 
Disclosure and Accountability 


e Make information salient, simple, and comparable. 

e Set both long-term targets and specific, short-term, 
achievable goals. 

e Hold people and organizations accountable for their 


follow-through. 


Designing Change 


If I could send you off with one big takeaway, it would be this: we 
can reduce gender inequality. We will use all we know about how 
the mind works, how biases influence decisions and outcomes, and 
how behavioral design can alter these. We can effect this change 
not in a matter of decades but in a matter of years. Even good 
design cannot solve all our problems. But behavioral design is the 
most useful and underutilized tool we have. Truth be told, we col- 
lectively can do this only if you take part. 

I wrote What Works to give you the research-based insights, 
the confidence, and the practical advice necessary for designing 
your own changes at work or school. In this last chapter, I want 
to make it easier still. Let me start by introducing an acronym that 
may help you remember the promise of DESIGN. 


D: fordata 
E: for experiment 


SIGN: for signpost 


Good behavioral design starts with data. How many men and 


women has your company hired and promoted, to what positions 
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and at what salaries, over the past five years? Are boys and girls in 
your school gaining proficiency at reading, staying the same, or 
becoming less proficient? How many of the portraits in your 
organization’s lobby or conference rooms are of women, and how 
many are of men? 

Armed with data, a behavioral designer must experiment. You 
do not start off in a design-free environment. In that sense, you 
are already participating in an experiment, just unknowingly and 
without the benefit of having a control group available. When you 
design your experiment, do so knowingly and responsibly. Adhere 
to the ethical standards set by entities such as the Institutional Re- 
view Boards (IRBs) overseeing experimentation at universities.! 

Experiment with new signposts—on restaurant doors, in in- 
terview protocols, or on office walls—that use insights about 
human behavior to point people in more desirable directions. Re- 
member the hotel key cards that automatically turn room lights on 
and off? Find signposts that, like these keys, make it easy for our 
biased minds to make unbiased choices. Do not focus on changing 
minds—the very purpose of signposts is to help us find the way 
without having to memorize or even think much about it. 

So in brief: collect data to understand whether and why there 
is gender inequality; experiment with what might close gender 
gaps; and informed by behavioral insights to create signposts, 
nudge behavior toward more equality. Finally, be sure to let col- 
leagues know that in embracing DESIGN, they are joining an 
increasing number of governments, corporations, schools, and 
universities that, responding to the promise of behavioral insights 
to change behavior for good, have done the same. 

Consider savings. Millions have been added to people’s retire- 


ment accounts in Denmark, New Zealand, the United Kingdom, 
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the United States, and elsewhere through auto-enrollment, 
automatic employer contributions, active choice, and Save More 
Tomorrow plans where employees commit future rather than 
present earnings to savings accounts. To put this in perspective, 
recall the evidence from Denmark discussed in the first chapter. 
Every dollar of government expenditure on subsidies led to an in- 
crease in savings by one cent—but the United States keeps spending 
about $100 billion and the United Kingdom about $30 billion per 
year on subsidies. 

There are many more success stories. In the United Kingdom, 
the Behavioral Insights Team has helped increase the poor’s uni- 
versity enrollment rates by 25 percent with pre-filled application 
forms, increased payment of taxes by up to 16 percent by reminding 
taxpayers of the prevalent norms, and encouraged healthy eating 
by having employees make their choices ahead of time. It has done 
so through data collection, experimentation, and careful evalua- 
tion, as David Halpern’s insightful 2015 book, Inside the Nudge 
Unit, explains. The recently established What Works institutes in 
the United Kingdom provide practitioners with actionable intel- 
ligence ranging from how to enhance educational attainment in 
schools to how to boost local growth. Building on these successes 
in other areas, the time is right to address gender equality.” 

Many other countries have launched initiatives similar to the 
UK’s Behavioral Insights Team, including Australia, Austria, Den- 
mark, Finland, Germany, Guatemala, Mexico, the Netherlands, 
Norway, Singapore, South Africa, Sweden, and the United States. 
And so have international organizations. The World Bank’s 
flagship report, the World Development Report, was devoted to be- 
havioral insights; it launched a Global Insights Initiative in 2015. 


President Jim Yong Kim writes: “The promise of this approach to 
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decision making and behavior is enormous, and its scope of ap- 
plication is extremely wide... This year’s World Development 
Report . . . introduces an important new agenda for the develop- 
ment community going forward.” Berkeley, Carnegie Mellon, 
Chicago, Harvard, Princeton, and the University of Pennsylvania, 
among other universities, are at the forefront of behavioral 
insights. At Harvard, the Behavioral Insights Group that I co- 
chair with Max Bazerman is located at the Kennedy School’s 
Center for Public Leadership because applying behavioral insights 
is a key leadership skill. A good leader is a behavioral designer.’ 

Leaders need to do many things. Most fundamentally, leaders 
need followers. Political leaders are well advised to take note of 
research by Todd Rogers and his colleagues on how to increase 
voter turnout. Barack Obama and Prime Minister Narendra Modi 
did, and it served them well. Helping voters make plans on when 
to vote and how to get there and reminding them that most others 
are voting, too, increases turnout, more so than the traditional 
strategies used, and at very low cost.* 

Another important leadership task is to find talent. The police 
force in the United Kingdom substantially increased the talent pool 
by adopting a friendlier tone and asking people a simple question 
before they took the entry exam: Why do you want to join the 
police and why does it matter to your community? Hearing a dif- 
ferent tone and being nudged to think about what motivated them 
increased the pass rate of minority group applicants by 50 percent. 
Making a small change that had a big impact required no more 
than the insights and experimentation of a few creative thinkers 
at the UK Behavioral Insights Team. 

Finally, a leader will want to motivate people to give it their 


best, both in terms of how much value they produce and how they 
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behave. Consider another subtle design based on research by Scott 
Wiltermuth and Francesca Gino: people work harder when re- 
wards are grouped into categories. The same bonuses, for example, 
two all-expenses-paid vacations, feel different when they come 
from two different “buckets.” When two goals are set, people work 
harder to achieve the second goal when the incentives—two free 
trips—are labeled differently. Though the trips can be identical, if 
people first work for the “blue bucket” trip and then have a shot 
at the “red bucket” trip, more than three times as many people 
continue to work hard to achieve the second goal. Even after 
earning a trip from the blue bucket, people feel that they would 
be “missing out” if they did not also try for a trip from the red 
bucket. 

Good leadership does not end with productivity. Leaders must 
also make and promote ethical choices. Much insight has been 
gained on the blind spots that keep us from doing the right thing 
in the emerging field of behavioral ethics. The same design prin- 
ciples apply. To overcome the tension between what we want and 
what we know we should do, locking in our future choices today 
helps us control not only our retirement savings and our calorie 
consumption, but also our moral decisions. We are more likely to 
do what is right when the choices will be implemented in the 
distant future rather than today or tomorrow. A leader-designer, 
thus, will not make structural changes overnight, but instead, 
for example, secure a commitment to roll out a new structured 
interview protocol in a few months.” 

Among corporations, few have gone further to embrace 
DESIGN principles than Google. Like most tech companies, 
Google has a long way to go in terms of gender equality. But 


data led Google to introduce employee training to identify 
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unconscious bias. After reading the evidence, Laszlo Bock, head 
of Google’s People Operations, started to wonder how bias 
might play out at Google: “This is a pretty genteel environment, 
and you don’t usually see outright manifestations of bias,” he said. 
“Occasionally you'll have some idiot do something stupid and 
hurtful, and I like to fire those people.” He suspected that most of 
the manifestations of biases at Google were happening behind 
closed doors, hidden to most, even to the employees influenced 
by them. And, indeed, after taking the Implicit Association Test, 
he reported: “Suddenly you go from being completely oblivious 
to going, ‘Oh my god, it’s everywhere. ”6 

The IAT, https://implicit.harvard.edu, is just one of the tools 
you can use to start the diagnosis. Another one is EDGE, www 
.edge-cert.org. Once you understand both your own data and 
what is going on in your organization and needs fixing, you can 
move on to experimentation. For a summary of what has al- 
ready been learned, I recommend the Gender Action Portal, 
http://gap.hks.harvard.edu/, The Women and Public Policy 
Program’s free online platform allows you to search for user- 
friendly summaries of scientific evidence—based on experi- 
ments in the field and the laboratory—on what works to close 
gender gaps in economic opportunity, political participation, 
health, and education. If your concern is potential bias in your 
hiring, I recommend you try out a tool such as Applied, www 
-beapplied.com, which allows you to easily employ a more 
structured approach.’ 

You don’t have to wait. You can begin designing immediately. 
I have offered you thirty-six research-grounded design suggestions 


in this book. As you get ready to try some of them, let me sum- 
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marize some key design principles, focusing on the four areas that 
we have covered in this book: training, talent management, school 
and work, and diversity. These become useful shorthand aspira- 


tions as you introduce any single or several designs. 


1. Training: Move from “training” to “capacity building.” 

2. Talent Management: Move from “intuition” to “data” and 
“structure.” 

3. School and Work: Move from an “uneven” to an “even 
playing field.” 

4. Diversity: Move from a “numbers game” to the “condi- 


tions for success.” 


Let me end on a final example, one from my backyard, Boston, 
in which the Women and Public Policy Program (wappp.hks 
-harvard.edu) has been heavily involved. On April 9, 2013, Boston’s 
then mayor, Thomas Menino, established the Women’s Work- 
force Council, composed of leaders from across business, gov- 
ernment, nonprofits, and academia. The council’s mission is to 
help “close the gender wage gap and remove the visible and invis- 
ible barriers to women’s advancement in today’s working world.” 
It produced a report outlining what companies, agencies, and not- 
for-profit organizations could be doing to reduce the gender 
wage gap, and a compact, titled “100% Talent,” that organizations 
could sign, declaring that they were willing to try out at least 
three research-based interventions and let researchers evaluate 
their impact. As I write this book, fifty companies have signed the 
compact, and the council is in the midst of collaborating with the 
firms on data collection. Many of these companies are collecting 
and analyzing their data by gender for the first time. A huge 
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success for the Women’s Workforce Council, it is also the begin- 
ning of a journey that will allow these firms to use proven designs 
to fix what is broken.’ 

Companies, universities, and governments from around the 
world have begun a quest to design gender equality. We can move 
the needle toward a fairer and better world today. Now it is up 


to you. 
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